Compare commits

..

11082 Commits

Author SHA1 Message Date
73738ec570 bump version to 1.0 (#11717)
Summary:
I'm just doing the honors and bumping the version to 1.0.0.

1.0 preview and RC releases will have the 1.0.0.dev{date} tag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11717

Reviewed By: SsnL

Differential Revision: D9840857

Pulled By: soumith

fbshipit-source-id: 4c9c2e01dccb3c521dab26c49e1569d970a87ace
2018-09-17 12:13:48 -07:00
47d65ed34f Fix issue 10492 (#11634)
Summary:
- pass infos vector by reference
- checkErrors takes infos vector by reference
- modified gesv tests to not cause infs or nans sporadically
- also clean up error messages

Reviewed By: ezyang

Differential Revision: D9818550

Pulled By: soumith

fbshipit-source-id: 00215205ff88767d6a5e921322394c5fd915d6d8
2018-09-17 12:13:45 -07:00
39520ffec1 remove Type/Tensor/TensorMethods include order dependencies. (#11720)
Summary:
Previously, it was a necessity to include TensorMethods.h after Tensor.h in order to get the tensor method definitions.
We abstracted this away from users by making sure ATen.h did this correctly; but we don't have any equivalent for ATen/core.

In order to solve this dependency issue, we now forward declare Tensor in the Type declaration, which breaks the dependency cycle.
Type.h now includes Tensor.h (for backwards compatibility) and Tensor.h now includes TensorMethods.h, so there is no longer include dependency restrictions.

We could get rid of TensorMethods.h completely now, but that would involve coordinating a code generation change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11720

Reviewed By: ezyang

Differential Revision: D9841488

Pulled By: gchanan

fbshipit-source-id: 1668199095e096c1790e646b5dc9f61ec1b33c0a
2018-09-17 11:10:32 -07:00
e125e61824 Fix flake8
Summary: Fix flake8

Reviewed By: ezyang

Differential Revision: D9873872

fbshipit-source-id: 26e81238f22caaeccd2c8b4f39cedb6cfb5520dd
2018-09-17 11:10:29 -07:00
cdefc27795 Support lr adaption for SparseAdam and RowWiseSparseAdam (#11162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11162

as title, fix pr test failure

Reviewed By: chocjy

Differential Revision: D9619308

fbshipit-source-id: 0a2228841ed8fadb15f07e94d3575aa701b10146
2018-09-17 10:29:03 -07:00
7949250295 Fixes for Torch Script C++ API (#11682)
Summary:
A couple fixes I deem necessary to the TorchScript C++ API after writing the tutorial:

1. When I was creating the custom op API, I created `torch/op.h` as the one-stop header for creating custom ops. I now notice that there is no good header for the TorchScript C++ story altogether, i.e. when you just want to load a script module in C++ without any custom ops necessarily. The `torch/op.h` header suits that purpose just as well of course, but I think we should rename it to `torch/script.h`, which seems like a great name for this feature.

2. The current API for the CMake we provided was that we defined a bunch of variables like `TORCH_LIBRARY_DIRS` and `TORCH_INCLUDES` and then expected users to add those variables to their targets. We also had a CMake function that did that for you automatically. I now realized a much smarter way of doing this is to create an `IMPORTED` target for the libtorch library in CMake, and then add all this stuff to the link interface of that target. Then all downstream users have to do is `target_link_libraries(my_target torch)` and they get all the proper includes, libraries and compiler flags added to their target. This means we can get rid of the CMake function and all that stuff. orionr  AFAIK this is a much, much better way of doing all of this, no?

3. Since we distribute libtorch with `D_GLIBCXX_USE_CXX11_ABI=0`, dependent libraries must set this flag too. I now add this to the interface compile options of this imported target.

4. Fixes to JIT docs.

These could likely be 4 different PRs but given the release I wouldn't mind landing them all asap.

zdevito dzhulgakov soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11682

Differential Revision: D9839431

Pulled By: goldsborough

fbshipit-source-id: fdc47b95f83f22d53e1995aa683e09613b4bfe65
2018-09-17 09:54:50 -07:00
a7e3cd09e0 Fix ctc gradient handling (#11753)
Summary:
Fixes: #11750

Also fix cuda ctc with double to enable gradient check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11753

Differential Revision: D9861318

Pulled By: ezyang

fbshipit-source-id: 2e7afea2b60dbbd891bb5d0bda61ee75fe01d933
2018-09-17 09:54:47 -07:00
07fd4450ab Revert D9831398: [pytorch][PR] Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0)
Differential Revision:
D9831398

Original commit changeset: db119d3f9c26

fbshipit-source-id: 4f183c9c178c159473bdaaa6299d4d5eb8afe549
2018-09-17 09:39:23 -07:00
f6a6d7fae1 Switch at::TensorImpl to store TypeMeta rather than ScalarType
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11702

Reviewed By: cpuhrsch

Differential Revision: D9831384

fbshipit-source-id: 1b1233a70ed70b47a3dab4a5797b6cfcb7a2c265
2018-09-17 09:09:35 -07:00
6660a128a5 Cache and use TypeMeta in TensorImpl (#11706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11706

This is necessary to handle use-cases when Storage is not set (because the
tensor in question doesn't have a notion of storage.)

Reviewed By: orionr

Differential Revision: D9833361

fbshipit-source-id: e90a384019f44f57682b687d129b54e85b6fabb9
2018-09-17 08:58:13 -07:00
2baba7f835 Add storage_offset to Caffe2 (#11701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11701

There's one extra multiply from TypeMeta::itemsize() which needs
to be characterized.  For all existing Caffe2 uses, storage_offset
is zero.

Reviewed By: li-roy

Differential Revision: D9831230

fbshipit-source-id: 353678edf76d2ccc297a73475a34f6ab2a20d1e1
2018-09-17 08:58:11 -07:00
35518b3dc7 Back out "Back out "Refactor Tensor/TensorImpl constructors."" E2: Confirm problem with old patch (#11744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11744

Original commit changeset: 093e4c47d557

Restores D9813742

Reviewed By: dzhulgakov

Differential Revision: D9847835

fbshipit-source-id: f3f467891e01c923dd9d3352d892cf59e10402f1
2018-09-17 08:58:09 -07:00
0d345cfa18 Remove Type method defaults in ATen. (#11675)
Summary:
This will allow us to break the dependency cycle between Tensor and Type, because currently Type has defaulted Tensor (reference)  arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11675

Reviewed By: ezyang

Differential Revision: D9819720

Pulled By: gchanan

fbshipit-source-id: a9577ac34a358120075129ab0654e7862d1dace6
2018-09-17 08:58:07 -07:00
5bfd8f583c Moving copy of Caffe2 protos back to build_pytorch_libs.sh (#11726)
Summary:
This way it shows up in all current and future setup.py commands, as otherwise we'd have to override every once to have them all call copy_protos. This is needed because the nightly packages still do not include caffe2_pb2, because setup.py bdist does not go through setup.py install or setup.py develop
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11726

Reviewed By: orionr

Differential Revision: D9844075

Pulled By: pjh5

fbshipit-source-id: 57b469e48010aacd0c08c214ba8a7e5d757feefa
2018-09-17 08:58:05 -07:00
a8b1755de6 Check device argument makes sense for legacy tensor constructors. (#11669)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/11427.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11669

Differential Revision: D9817881

Pulled By: gchanan

fbshipit-source-id: 77dc5b0e6bc9884d2616210b96c07e4734058bb6
2018-09-17 08:24:25 -07:00
d63bb72d89 Remove symbol export annotations in THC/generic/*.cu (#11367)
Summary:
We use these annotations during function declarations, not definitions. See the description of compiler error [C2491](https://msdn.microsoft.com/en-us/library/62688esh.aspx) for more details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11367

Reviewed By: ezyang

Differential Revision: D9697923

Pulled By: orionr

fbshipit-source-id: 1e539c02957851386f887e6d0510ce83117a1695
2018-09-17 08:24:23 -07:00
f5bc2aef07 Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#11563)
Summary:
Fix the link OpenMP link error for AppleClang 9.0 compiler.

Built with the following command:
python setup.py build develop

The error message:

```
Undefined symbols for architecture x86_64:
  "___kmpc_critical", referenced from:
      _THFloatTensor_addmm in THTensorMath.cpp.o
      _THDoubleTensor_addmm in THTensorMath.cpp.o
      _THByteTensor_addmm in THTensorMath.cpp.o
      _THCharTensor_addmm in THTensorMath.cpp.o
      _THShortTensor_addmm in THTensorMath.cpp.o
      _THIntTensor_addmm in THTensorMath.cpp.o
      _THLongTensor_addmm in THTensorMath.cpp.o
      ...
  "___kmpc_end_critical", referenced from:
      _THFloatTensor_addmm in THTensorMath.cpp.o
      _THDoubleTensor_addmm in THTensorMath.cpp.o
      _THByteTensor_addmm in THTensorMath.cpp.o
      _THCharTensor_addmm in THTensorMath.cpp.o
      _THShortTensor_addmm in THTensorMath.cpp.o
      _THIntTensor_addmm in THTensorMath.cpp.o
      _THLongTensor_addmm in THTensorMath.cpp.o
      ...
  "___kmpc_end_reduce_nowait", referenced from:
      _.omp_outlined..270 in THTensorMoreMath.cpp.o
      _.omp_outlined..271 in THTensorMoreMath.cpp.o
      _.omp_outlined..273 in THTensorMoreMath.cpp.o
      _.omp_outlined..275 in THTensorMoreMath.cpp.o
      _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o
      ...
  "___kmpc_end_serialized_parallel", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "___kmpc_for_static_fini", referenced from:
      _.omp_outlined..9 in Embedding.cpp.o
      _.omp_outlined. in EmbeddingBag.cpp.o
      _.omp_outlined. in GridSampler.cpp.o
      _.omp_outlined..42 in GridSampler.cpp.o
      _.omp_outlined..44 in GridSampler.cpp.o
      _.omp_outlined..45 in GridSampler.cpp.o
      _.omp_outlined..47 in GridSampler.cpp.o
      ...
  "___kmpc_for_static_init_4", referenced from:
      _.omp_outlined. in init.cpp.o
      _.omp_outlined..35 in init.cpp.o
      _.omp_outlined..36 in init.cpp.o
      _.omp_outlined..37 in init.cpp.o
      _.omp_outlined..49 in init.cpp.o
      _.omp_outlined..52 in init.cpp.o
      _.omp_outlined..220 in init.cpp.o
      ...
  "___kmpc_for_static_init_8", referenced from:
      _.omp_outlined..9 in Embedding.cpp.o
      _.omp_outlined. in EmbeddingBag.cpp.o
      _.omp_outlined. in GridSampler.cpp.o
      _.omp_outlined..42 in GridSampler.cpp.o
      _.omp_outlined..44 in GridSampler.cpp.o
      _.omp_outlined..45 in GridSampler.cpp.o
      _.omp_outlined..47 in GridSampler.cpp.o
      ...
  "___kmpc_for_static_init_8u", referenced from:
      _.omp_outlined..203 in init.cpp.o
      _.omp_outlined..207 in init.cpp.o
      _.omp_outlined..209 in init.cpp.o
      _.omp_outlined..210 in init.cpp.o
  "___kmpc_fork_call", referenced from:
      at::native::embedding_dense_backward_cpu(at::Tensor const&, at::Tensor const&, long long, long long, bool) in Embedding.cpp.o
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::grid_sampler_2d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_3d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_2d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_3d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      ...
  "___kmpc_global_thread_num", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "___kmpc_push_num_threads", referenced from:
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      ...
  "___kmpc_reduce_nowait", referenced from:
      _.omp_outlined..270 in THTensorMoreMath.cpp.o
      _.omp_outlined..271 in THTensorMoreMath.cpp.o
      _.omp_outlined..273 in THTensorMoreMath.cpp.o
      _.omp_outlined..275 in THTensorMoreMath.cpp.o
      _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o
      ...
  "___kmpc_serialized_parallel", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "_omp_get_max_threads", referenced from:
      _THGetNumThreads in THGeneral.cpp.o
      caffe2::Caffe2SetOpenMPThreads(int*, char***) in init_omp.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      ...
  "_omp_get_num_procs", referenced from:
      _THGetNumCores in THGeneral.cpp.o
  "_omp_get_num_threads", referenced from:
      _.omp_outlined. in Embedding.cpp.o
      _.omp_outlined. in SoftMax.cpp.o
      _.omp_outlined..35 in SoftMax.cpp.o
      _.omp_outlined..37 in SoftMax.cpp.o
      _.omp_outlined..38 in SoftMax.cpp.o
      _.omp_outlined..46 in SoftMax.cpp.o
      _.omp_outlined..47 in SoftMax.cpp.o
      ...
  "_omp_get_thread_num", referenced from:
      _.omp_outlined. in Embedding.cpp.o
      _.omp_outlined. in SoftMax.cpp.o
      _.omp_outlined..35 in SoftMax.cpp.o
      _.omp_outlined..37 in SoftMax.cpp.o
      _.omp_outlined..38 in SoftMax.cpp.o
      _.omp_outlined..46 in SoftMax.cpp.o
      _.omp_outlined..47 in SoftMax.cpp.o
      ...
  "_omp_in_parallel", referenced from:
      _THFloatTensor_copy in THTensorCopy.cpp.o
      _THDoubleTensor_copy in THTensorCopy.cpp.o
      _THByteTensor_copy in THTensorCopy.cpp.o
      _THCharTensor_copy in THTensorCopy.cpp.o
      _THShortTensor_copy in THTensorCopy.cpp.o
      _THIntTensor_copy in THTensorCopy.cpp.o
      _THLongTensor_copy in THTensorCopy.cpp.o
      ...
  "_omp_set_num_threads", referenced from:
      _THSetNumThreads in THGeneral.cpp.o
      caffe2::Caffe2SetOpenMPThreads(int*, char***) in init_omp.cc.o
ld: symbol(s) not found for architecture x86_64
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11563

Differential Revision: D9831398

Pulled By: ezyang

fbshipit-source-id: db119d3f9c26a71180335ad955f2f62c5369f9ed
2018-09-17 08:24:20 -07:00
6f6b03566b Vectorize grid sample 2d CPU kernels (#10980)
Summary:
This PR vectorizes the CPU grid sample 2d forward and backward kernels. Specifically,

 1. add `.data()` in `TensorAccessor`
 2. support non-void return value for declaring CPU kernel stub
 2. add `bool at:: geometry_is_contiguous(IntList sizes, IntList strides)`
1. The following vectorized CPU primitives are added:

    + `gather<scale>(baseaddr, vindex)`: `result[i] = baseaddr[vindex[i] * scale]`
    + `mask_gather<scale>(src, baseaddr, vindex, mask)`: `result[i] = mask[i] ? baseaddr[vindex[i] * scale] : src[i]`.
    + comparison ops
    + binary logical ops
    + `min(a, b)`
    + `cast<dst_t, src_t>(src_vec)`: changing dtype but keeping the bit representation
    + `blendv(a, b, mask)`: `result[i] = mask[i] ? b[i] : a[i]`.
    + ctor with multiple values (i.e., `setr`)
    + `arange(start = 0, step = 1)`: constructs a vector with values specified by the arange parameters
    + `convert_to_int_of_same_size(vec)`: convert floating point vector to corresponding integral type of same size
    + `interleave2(a, b)` & `deinterleave2(x, y)`: interleave or deinterleaves two vectors. E.g., for `interleave`:
        ```
        inputs:
          {a0, a1, a2, a3, a4, a5, a6, a7}
          {b0, b1, b2, b3, b4, b5, b6, b7}
        outputs:
          {a0, b0, a1, b1, a2, b2, a3, b3}
          {a4, b4, a5, b5, a6, b6, a7, b7}
        ```

  2. Grid sample CPU kernel implementations are described in the following note (also in `GridSampleKernel.cpp`:

  ```
   NOTE [ Grid Sample CPU Kernels ]

   Implementation of vectorized grid sample CPU kernels is divided into three
   parts:

   1. `ComputeLocation` struct
      Transforms grid values into interpolation locations of the input tensor
      for a particular spatial dimension, basing on the size of that dimension
      in input tensor, and the padding mode.
```
```cpp
      template<typename scalar_t, GridSamplerPadding padding>
      struct ComputeLocation {
        using Vec = Vec256<scalar_t>;

        // ctor
        ComputeLocation(int64_t size);

        // Given grid values `in`, return the interpolation locations after
        // un-normalization and padding mechanism (elementwise).
        Vec apply(const Vec &in) const;

        // Similar to `apply`, but also returns `d apply(in) / d in`
        // (elementwise).
        // this is often used in gradient computation.
        std::pair<Vec, Vec> apply_get_grad(const Vec &in) const;
      };
```
```
   2. `ApplyGridSample` struct
      Owns N `ComputeLocation` structs, where N is the number of spatial
      dimensions. Given N input grid vectors (one for each spatial dimension)
      and spatial offset, it gets the interpolation locations from
      `ComputeLocation`s, applies interpolation procedure, and then writes to
      the output (or grad_input & grad_grid in backward).
```
```cpp
      template<typename scalar_t, int spatial_dim,
               GridSamplerInterpolation interp,
               GridSamplerPadding padding>
      struct ApplyGridSample {

        // ctor
        ApplyGridSample(const TensorAccessor<scalar_t, 4>& input);

        // Applies grid sampling (forward) procedure:
        //   1. computes interpolation locations from grid values `grid_x` and
        //      `grid_y`,
        //   2. interpolates output values using the locations and input data
        //      in `inp_slice`, and
        //   3. writes the first `len` values in the interpolated vector to
        //      `out_slice` with spatial offset being `offset`.
        //
        // This assimes that `grid_x` and `grid_y` all contain valid grid
        // values \in [-1, 1], even at indices greater than `len`.
        //
        // The `*_slice` argument namess mean samples within a batch (i.e.,
        // with the batch dimension sliced out).
        void forward(TensorAccessor<scalar_t, 3>& out_slice,
                     const TensorAccessor<scalar_t, 3>& inp_slice,
                     int64_t offset, const Vec& grid_x, const Vec& grid_y,
                     int64_t len) const;

        // Applies grid sampling (backward) procedure. Arguments semantics
        // and strategy are similar to those of `forward`.
        void backward(TensorAccessor<scalar_t, 3>& gInp_slice,
                      TensorAccessor<scalar_t, 3>& gGrid_slice,
                      const TensorAccessor<scalar_t, 3>& gOut_slice,
                      const TensorAccessor<scalar_t, 3>& inp_slice,
                      int64_t offset, const Vec& grid_x, const Vec& grid_y,
                      int64_t len) const;
      }
```
```
   3. `grid_sample_2d_grid_slice_iterator` function
      Among the tensors we work with, we know that the output tensors are
      contiguous (i.e., `output` in forward, and `grad_input` & `grad_grid` in
      backward), we need to randomly read `input` anyways, and `grad_output`
      usually comes from autograd and is often contiguous. So we base our
      iterating strategy on the geometry of grid.
      `grid_sample_2d_grid_slice_iterator` function provides an abstract to
      efficiently iterates through a `grid` slice (without batch dimension).
      See comments of that function on the specific cases and strategies used.
```
```cpp
      template<typename scalar_t, typename ApplyFn>
      void grid_sample_2d_grid_slice_iterator(
        const TensorAccessor<scalar_t, 3>& grid_slice,
        const ApplyFn &apply_fn);

      // `apply_fn` is a function/lambda that can be called as if it has
      // declaration:
      //   void apply_fn(const Vec256<scalar_t>& grid_x,
      //                 const Vec256<scalar_t>& grid_y,
      //                 int64_t spatial_offset, int64_t len);
```
```
      `apply_fn` will be called multiple times, and together cover the entire
      output spatial space. Therefore, e.g., to implement forward 2d grid
      sample, we can do
```
```cpp
      ApplyGridSample<scalar_t, 2, interp, padding> grid_sample(input_accessor);

      for (int n = 0; n < input_accessor.size(0); n++) {
        grid_sample_2d_grid_slice_iterator(
          grid_accessor[n],
          [&](const Vec256<scalar_t>& grid_x, const Vec256<scalar_t>& grid_y,
              int64_t spatial_offset, int64_t len) {
            grid_sample.forward(out_accessor[n], input_accessor[n],
                                spatial_offset, grid_x, grid_y, len);
          });
      }
   ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10980

Differential Revision: D9564867

Pulled By: SsnL

fbshipit-source-id: 5b7c3c7ea63af00eec230ae9ee1c3e6c6c9679b4
2018-09-16 20:41:10 -07:00
10c29c8970 Fix CUDA 8 build on Windows (#11729)
Summary:
Tested via https://github.com/pytorch/pytorch/pull/11374.
Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11729

Differential Revision: D9847807

Pulled By: orionr

fbshipit-source-id: 69af3e6c5bba0abcbc8830495e867a0b1b399c22
2018-09-16 08:09:24 -07:00
ca6f08f359 Set correct dtype for fp16 op inference function (#11693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11693

as desc.

Reviewed By: hyuen

Differential Revision: D9829061

fbshipit-source-id: 0f4c8a9d2b95d4cf5fa20a2aefd5671f273a8e76
2018-09-15 23:40:41 -07:00
b3e726042c Do not use FixedDivisor in ROCM order switch op (#11697)
Summary:
Fix the recent order_switch_test failure in ROCM CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11697

Reviewed By: BIT-silence

Differential Revision: D9831039

Pulled By: bddppq

fbshipit-source-id: 2368fd1ac7b1bab335ff3377071246cfd3392f3f
2018-09-15 18:24:51 -07:00
eb3c47bdd5 max -> fmaxf in cross_entropy kernel (#11733)
Summary:
Changing `max` to `fmaxf` in `LabelCrossEntropy` kernel for hip to work correctly.

bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11733

Differential Revision: D9846783

Pulled By: bddppq

fbshipit-source-id: c1b394d2ba7ee0e819f7bf3b36b53d1962de5522
2018-09-15 18:13:42 -07:00
f09054f8d0 Remove deprecate warning for Upsampling (#11568)
Summary:
Fixes #11452 .

Based on the discussion with SsnL  and soumith , we want to bring back Upsample as a module instead of introducing a new nn.interpolate module for now. If anyone want to do downsample, they should use `nn.functional.interpolate ` instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11568

Differential Revision: D9804359

Pulled By: ailzhang

fbshipit-source-id: 2b232d55fc83c2b581bf336f1ee8d1cf1c1159ca
2018-09-14 17:54:48 -07:00
bb6f18c44f Simplify IValue::toTensor() (#11355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11355

There is no reason to implement refcounting manually in this case.
Given the correct NullType, toIntrusivePtr() and moveToIntrusivePtr() will do the right thing.

Reviewed By: ezyang

Differential Revision: D9694918

fbshipit-source-id: 8aae4d66aec32ca5f85c438d66339bd80b72b656
2018-09-14 16:57:15 -07:00
690c999bba Simplify union payload copying (#11353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11353

Before, there was one extra member in the union that had to be at least as large as the largest other member, because it was used for copying.

Now, this isn't needed anymore and we copy the union directly.

Reviewed By: ezyang

Differential Revision: D9694326

fbshipit-source-id: 42b2f7d51ac5d4ea5ebafea3a598b018e10fed68
2018-09-14 16:57:14 -07:00
270fb22bd8 Remove intrusive_ptr::reclaim() in Storage (2/2) (#11547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11547

Pushing manual refcounting further back, making things safer.

Reviewed By: ezyang

Differential Revision: D9778042

fbshipit-source-id: c9572edc440c5ce5ea1b2355b5c54f87078ea28e
2018-09-14 16:57:12 -07:00
f4d9fe395d Remove intrusive_ptr::reclaim() in Storage (#11352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11352

Pushing manual refcounting further back, making things safer.

Reviewed By: ezyang

Differential Revision: D9694327

fbshipit-source-id: befdbcac199225383a93520472ee7c6511a0e9cd
2018-09-14 16:57:10 -07:00
2c8a1b957e Back out "Refactor Tensor/TensorImpl constructors."
Summary: Original commit changeset: 7501b54fe5f3

Reviewed By: gchanan

Differential Revision: D9838097

fbshipit-source-id: 093e4c47d5574ce99f706b0683ef369a89b62b38
2018-09-14 16:39:31 -07:00
8e76dcf173 Prevent raising KeyboardInterrupt in worker (#11718)
Summary:
Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print
```
RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace.
```
due to our SIGCLD handler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718

Differential Revision: D9840844

Pulled By: SsnL

fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187
2018-09-14 16:09:35 -07:00
d24bcfd930 Suppress hiprand "duplicate-decl-specifier" warning (#11698)
Summary:
Otherwise each build produces 65MB of warnings log, which makes the CI hard to debug.

iotamudelta Jorghi12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11698

Differential Revision: D9840356

Pulled By: bddppq

fbshipit-source-id: b69bf6a5c38a97b188221f9c084c608ffc9b37c8
2018-09-14 15:51:43 -07:00
8e3f8c52e8 Document the Sequential module (#11648)
Summary:
1. Document the Sequential module in the C++ API at a high, why-does-this-exist, and low, how-to-use, level
2. Change the Sequential tests to be in a style that makes them easier to convert to gtest. No code changes.

ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11648

Differential Revision: D9834526

Pulled By: goldsborough

fbshipit-source-id: 39f2f5c6cbbf8ed5a1b69986978c8ef127036de1
2018-09-14 15:51:41 -07:00
96d3f968eb Splits CPU and CUDA fusion compilers (#10981)
Summary:
This PR splits the CPU and CUDA fusion compilers, putting them into a new jit/fusers/ directory with jit/fusers/common for common components. In particular:

- A fusion interface is created that allows "fusion handles" to be requested
- The CPU and CUDA fusers implement this interface, with dispatch determined by device
- The fusion compilers, fusion function specializations and resource strings are split
- CPU-specific classes like TempFile and DynamicLibrary are in the CPU fuser
- Common classes likes TensorDesc and the base fusion function class are in jit/fusers/common
- There is still some specialization in jit/fusers/common, but these specializations are small(-ish)
- Updates the build system to remove the dummy interface on Windows and minimize the use of macros

This structure should allow in-flight PRs to easily rebase while providing a clear interface to the fusers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10981

Reviewed By: soumith

Differential Revision: D9701999

Pulled By: apaszke

fbshipit-source-id: 3b6bec7b97e0444b2a93caa38d9b897f2e68c1b3
2018-09-14 14:05:34 -07:00
70e68e755a Casting for binary ops (#11708)
Summary:
Fixes #11663

`TensorIterator` was replacing the op tensors with type casted tensors
which ended up producing side effects in binary ops like `a.float() * b`
where `a` and `b` are `LongTensor`s.

colesbury ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11708

Differential Revision: D9834016

Pulled By: driazati

fbshipit-source-id: 4082eb9710b31dfc741161a0fbdb9a8eba8fe39d
2018-09-14 13:40:21 -07:00
224e62bbec respect USE_CUDA_STATIC_LINK in build_libtorch.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11713

Differential Revision: D9835972

Pulled By: anderspapitto

fbshipit-source-id: 046363b132e5487c05ef7e6e6d88b508196386a1
2018-09-14 12:25:08 -07:00
0c2648830f Augment emit_nvtx to help connect backward-pass Function apply calls with their corresponding forward pass ops (#10881)
Summary:
Often, we find ourselves looking at some long-running kernel or emit_nvtx range on an nvvp profile and trying to connect it to the offending line in a training script.  If the op is in the forward pass that's easy:  ops are enqueued explicitly from the Python side, so tracking it down with manual nvtx ranges supplemented by the built-in emit_nvtx ranges is straightforward.  If the op is in the backward pass, it's much more difficult.  From the Python side, all you can do is wrap loss.backward() in an nvtx range, and if you also use emit_nvtx, the automatic ranges provide only local information.  Right now, the only consistent way to connect backward-pass kernels to their associated forward-pass lines of Python is to understand your script line by line, and know exactly where in the backward pass you are.

This PR augments the existing nvtx machinery to bridge the gap between forward and backward, allowing connection of backward-pass Function apply calls to the forward-pass operations that required/created those Functions.

The method is simple and surgical.  During the forward pass, when running with emit_nvtx, the nvtx range for each function in VariableType is tagged with the current sequence number.  During the backward pass, the nvtx range associated with each Function's operator() is tagged with that Function's stashed sequence number, which can be compared to "current sequence numbers" from the forward pass to locate the associated op.

Double-backward is not a problem.  If a backward pass with create_graph = True is underway, the relationship between backward and double-backward is conceptually the same as the relationship between forward and backward:  The functions in VariableType still spit out current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and in the eventual double-backward execution, their operator() ranges are still tagged with the stashed numbers, which can be compared to "current sequence numbers" from the backward pass.

Minor caveats:

- The sequence number is thread-local, and many VariableType functions (specifically, those without a derivative explicitly defined in derivatives.yaml) don't create an associated function object (instead delegating that to sub-functions further down the call chain, perhaps called from within at::native functions that route back through VariableType by calling at::function_name).  So the correspondence of stashed sequence numbers in Function operator() ranges with numbers in forward-pass ranges is not guaranteed to be 1 to 1.  However, it's still a vast improvement over the current situation, and I don't think this issue should be a blocker.
- Feel free to litigate my use of stringstream in profiler.cpp.  I did it because it was easy and clean.  If that's too big a hammer, let's figure out something more lightweight.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10881

Differential Revision: D9833371

Pulled By: apaszke

fbshipit-source-id: 1844f2e697117880ef5e31394e36e801d1de6088
2018-09-14 11:56:55 -07:00
b90872c00e Get rid of default arguments for TH/THC factory functions. (#11673)
Summary:
This is causing codegen problems in caffe2, when we try to remove the circular Tensor/Type declarations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11673

Differential Revision: D9819341

Pulled By: gchanan

fbshipit-source-id: f2c2cd96e8a16f6de6aa4889e71b8a78e12e9256
2018-09-14 10:55:38 -07:00
7535d98ec4 Add message tag parameter to send/recv
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11490

Reviewed By: teng-li

Differential Revision: D9828116

Pulled By: pietern

fbshipit-source-id: 98be1ae84b6763ffb329e63c030c5e3ec0e748b7
2018-09-14 10:55:37 -07:00
3258fc11a7 Delete torch/csrc/api/README.md (#11703)
Summary:
We'll have separate docs for the C++ frontend, right now this file is just misleading
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11703

Differential Revision: D9832847

Pulled By: goldsborough

fbshipit-source-id: 2e8b30ccf6b5cba9d0526e6261160f7c6211a35c
2018-09-14 10:55:35 -07:00
278e304c18 Implement elif in string frontend (#11667)
Summary:
Closes #11625
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11667

Differential Revision: D9828145

Pulled By: jamesr66a

fbshipit-source-id: c72dc41cb310a4211b4e4c6b33f7e2c1fb3581a0
2018-09-14 10:09:46 -07:00
115b13ffab clean up some old Half stuff
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11687

Differential Revision: D9829027

Pulled By: li-roy

fbshipit-source-id: f35dcdf93ea57ba4fa775e36e9d6378bed46a710
2018-09-14 09:54:45 -07:00
eb039dc92c Add CHECKs into GetTensorInfo and ExtractDeviceOption (#11597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11597

We should always CHECK pointers which we plan to dereference
if they are inputs to the function. Nobody knows how the function will
be called in the future.

Reviewed By: yinghai

Differential Revision: D9800002

fbshipit-source-id: 7fd05f4717f2256d1b09a9e75475b12de6685b03
2018-09-14 09:40:27 -07:00
0d9b9100f9 Fix gesv and gels docs (#11699)
Summary: Closes #9935 and closes #5431 .

Differential Revision: D9830448

Pulled By: soumith

fbshipit-source-id: 4e5320a1d0c1d4c8253a5b26f4842cea76530514
2018-09-14 09:24:45 -07:00
72822ee6b2 Fix #11430 (CPU only builds raise opaque error message when calling .… (#11533)
Summary:
…cuda())

While I was at it, I audited all other ways I know how we might get a CUDA
type from PyTorch and fixed more constructors which don't work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11533

Differential Revision: D9775786

Pulled By: ezyang

fbshipit-source-id: cd07cdd375fdf74945539ec475a48bf08cbc0c17
2018-09-14 09:10:08 -07:00
2631da0822 Move some Tensor method definitions from Type.h to TensorMethods.h. (#11650)
Summary:
There's no reason they need to be in Type.h and this moves us along the path of not having circular dependencies (so we can get rid of TensorMethods.h).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11650

Reviewed By: ezyang

Differential Revision: D9812271

Pulled By: gchanan

fbshipit-source-id: 8b70db9a5eb0a332398ab2e8998eeaf7d2eea6d7
2018-09-14 08:56:02 -07:00
6c3792b9ec Implement UndefinedType::typeMeta.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11666

Differential Revision: D9816212

Pulled By: gchanan

fbshipit-source-id: 079899590150009bc2e2a3bbdc78a98de9380e37
2018-09-14 08:40:26 -07:00
cda71e2600 Disallow scalar parameters in Dirichlet and Categorical (#11589)
Summary:
This adds a small check in `Dirichlet` and `Categorical` `__init__` methods to ensure that scalar parameters are not admissible.

**Motivation**
Currently, `Dirichlet` throws no error when provided with a scalar parameter, but if we `expand` a scalar instance, it inherits the empty event shape from the original instance and gives unexpected results.

The alternative to this check is to promote `event_shape` to be `torch.Size((1,))` if the original instance was a scalar, but that seems to add a bit more complexity (and changes the behavior of `expand` in that it would affect the `event_shape` as well as the `batch_shape` now). Does this seem reasonable? cc. alicanb, fritzo.

```python
In [4]: d = dist.Dirichlet(torch.tensor(1.))

In [5]: d.sample()
Out[5]: tensor(1.0000)

In [6]: d.log_prob(d.sample())
Out[6]: tensor(0.)

In [7]: e = d.expand([3])

In [8]: e.sample()
Out[8]: tensor([0.3953, 0.1797, 0.4250])  # interpreted as events

In [9]: e.log_prob(e.sample())
Out[9]: tensor(0.6931)  # wrongly summed out

In [10]: e.batch_shape
Out[10]: torch.Size([3])

In [11]: e.event_shape
Out[11]: torch.Size([])  # cannot be empty
```

Additionally, based on review comments, this removes `real_vector` constraint. This was only being used in `MultivariateNormal`, but I am happy to revert this if we want to keep it around for backwards compatibility.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11589

Differential Revision: D9818271

Pulled By: soumith

fbshipit-source-id: f9bbba90ed6f04e0b5bdfa169e70ca20b280fc74
2018-09-14 07:55:35 -07:00
c391c20063 Adding .expand method for TransformedDistribution (#11607)
Summary:
This PR:
 - adds a `.expand` method for `TransformedDistribution` along the lines of #11341.
 - uses this method to simplify `.expand` in distribution classes that subclass off of `TransformedDistribution`.
 - restores testing of `TransformedDistribution` fixtures.
 - fixes some bugs wherein we were not setting certain attributes in the expanded instances, and adds tests for `.mean` and `.variance` which use these attributes.

There are many cases where users directly use `TransformedDistribution` rather than subclassing off it. In such cases, it seems rather inconvenient to have to write a separate class just to define a `.expand` method. The default implementation should suffice in these cases.

cc. fritzo, vishwakftw, alicanb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11607

Differential Revision: D9818225

Pulled By: soumith

fbshipit-source-id: 2c4b3812b9a03e6985278cfce0f9a127ce536f23
2018-09-14 07:55:33 -07:00
74197c7115 Restore support for dim=None on WeightNorm. (#11661)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11661

Reviewed By: veenix

Differential Revision: D9826799

Pulled By: ezyang

fbshipit-source-id: 9eec57bb27a365406669e412f6eb88741b22ed3d
2018-09-14 07:39:43 -07:00
19065f91fc Centralize TypeExtendedInterface casts. (#11576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11576

Previously, they were spattered throughout the codebase.
We now follow this convention:

- LegacyTypeDispatch gives you Type
- Context gives you TypeExtendedInterface
- Tensor::type() gives you Type
- at::getType() gives you TypeExtendedInterface

I change some sites to use getType() over type().

Reviewed By: SsnL

Differential Revision: D9790187

fbshipit-source-id: 5e2577cb590a5bbf5df530f3763d3b3c0b4625ca
2018-09-14 07:39:41 -07:00
c5f7da3f4a Support FP16 sparse lookup (#11674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11674

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11658

Reviewed By: hyuen

Differential Revision: D9676950

fbshipit-source-id: 89a115b9664b84e4e4436b7da033e5a428c2246d
2018-09-14 02:40:08 -07:00
1637729620 Fix ci by skipping some tests (#11668)
Summary:
scalar_tensor_test skipped
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11668

Differential Revision: D9825819

Pulled By: zrphercule

fbshipit-source-id: 6e62a001bcde49be8f7af1501b303bd93d09d005
2018-09-13 20:25:14 -07:00
e6fe8d9cf5 Try to delete codeowners for ATen/core (#10693)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10693

Reviewed By: soumith

Differential Revision: D9772210

Pulled By: ezyang

fbshipit-source-id: 14560eaf77441980e9784536acd0ffe20b15c5b8
2018-09-13 20:25:11 -07:00
2431eac7c0 Ensure most Distribution methods are jittable (#11560)
Summary:
This adds tests in tests/test_distributions.py to ensure that all methods of `Distribution` objects are jittable.

I've replaced a few samplers with jittable versions:
- `.uniform_()` -> `torch.rand()`
- `.exponential_()` -> `-(-torch.rand()).log1p()`
- `.normal_()` -> `torch.normal(torch.zeros(...), torch.ones(...), ...)`

Some jit failures remain, and are marked in test_distributions.py
- `Cauchy` and `HalfCauchy` do not support sampling due to missing `.cauchy_()`
- `Binomial` does not support `.enumerate_support()` due to `arange` ignoring its first arg.
- `MultivariateNormal`, `LowRankMultivariateNormal` do not support `.mean`, `.entropy`

- [x] Currently some tests fail (I've skipped those) due to unavailability of `aten::uniform` and `aten::cauchy` in the jit. Can someone suggest how to add these? I tried to add declarations to `torch/csrc/ir.cpp` and `torch/csrc/passes/shape_analysis.cpp`, but that resulted in "Couldn't find operator" errors.
- [x] There are still lots of `TracerWarning`s that something doesn't match something. I'm not sure whether these are real.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11560

Differential Revision: D9816327

Pulled By: apaszke

fbshipit-source-id: 72ec998ea13fc4c76d1ed003d9502e0fbaf728b8
2018-09-13 19:55:01 -07:00
99c0b96f68 optimize norm on ATen CPU backend (#11565)
Summary:
current torch.norm() runs sequentially on CPU. This PR did parallelization and vectorization of torch.norm() on ATen CPU path, roughly provide 2 order of magnitude performance boost.

Performance is benchmarks on Xeon skylake 8180, 2*28 cores 2.5GHz, using the following script:
```python
import torch
from time import time

count = 1000
size = 1000*1000

def test_norm(p=2):
    a = torch.randn(size)
    tstart = time()
    for i in range(count):
        torch.norm(a, p)
    tend = time()
    print("norm on size %d tensor p = %d: %f s" % (size, p, (tend-tstart)))

for p in range(4):
    test_norm(p)
```

without this optimization,
```
(intel-pytorch) [mingfeim@mlt-skx065 unit_tests]$ python test_norm.py
norm on size 1000000 tensor p = 0: 1.071235 s
norm on size 1000000 tensor p = 1: 1.069149 s
norm on size 1000000 tensor p = 2: 1.068212 s
norm on size 1000000 tensor p = 3: 69.735312 s
```

and with this optimization,
```
(pytorch-tf) [mingfeim@mlt-skx053 unit_tests]$ python test_norm.py
norm on size 1000000 tensor p = 0: 0.127507 s
norm on size 1000000 tensor p = 1: 0.011867 s
norm on size 1000000 tensor p = 2: 0.011907 s
norm on size 1000000 tensor p = 3: 0.014470 s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11565

Differential Revision: D9804484

Pulled By: ezyang

fbshipit-source-id: 52899f30ac26139d00684d07edfb47cb9b25d871
2018-09-13 19:40:43 -07:00
98e04db955 Implement requires_grad propagation in the JIT (#11586)
Summary:
Previously, we would pretty much assume that all floating point tensors do require grad, which might result in some unnecessary compute.

I don't really like the fact that `TensorType` uses `tensor.is_variable() && tensor.requires_grad()` to infer the value of `requires_grad`, but changing constants to keep variables turns out to be pretty hard. I got halfway there, but it would still need some more work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11586

Reviewed By: ezyang

Differential Revision: D9813648

Pulled By: apaszke

fbshipit-source-id: 77f77756d18ff7632fca3aa68ce855e1d7f3bdb8
2018-09-13 19:25:26 -07:00
513fd3dd36 Improve doc of torch.nn.functional.pad (#11623)
Summary:
I'm reading the doc of `torch.nn.functional.pad` and it looks a bit confusing to me. Hopefully this PR makes it clearer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11623

Differential Revision: D9818255

Pulled By: soumith

fbshipit-source-id: 4f6b17b0211c6927007f44bfdf42df5f84d47536
2018-09-13 19:25:24 -07:00
760679352e Move Pixel Shuffle to ATen (#9721)
Summary:
<del>#9692 </del>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9721

Differential Revision: D8955829

Pulled By: SsnL

fbshipit-source-id: 4f4d1c7720b6f757fbef9a10f70209ae76f61399
2018-09-13 18:25:48 -07:00
e1cd220b90 Reimplement swap() using default move constructor. (#11659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11659

This is less error-prone and less code.

Reviewed By: smessmer

Differential Revision: D9814536

fbshipit-source-id: 028510e31e2fa7a9fa11c1398b0743c5cd085dd5
2018-09-13 16:32:55 -07:00
02980d7f8c Refactor Tensor/TensorImpl constructors. (#11657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11657

Previously, we had a constructor in TensorImpl for every constructor in Tensor.
This was unnecessary and wordy: Tensor is the user-visible class, so it deserves
the constructors, but TensorImpl is internal and doesn't need it.  So
I replaced TensorImpl with a single, Storage accepting constructor, and then
rewrote Tensor to use that constructor.

Reviewed By: jerryzh168

Differential Revision: D9813742

fbshipit-source-id: 7501b54fe5f39180f1bc07573fd7c1640b0f4e89
2018-09-13 16:32:53 -07:00
7607b49538 s/GetDevicetype/device_type/ (#11656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11656

The mis-capitalization really sticks up my craw.  I know why (we
already have a static function named GetDeviceType), but let's
name it differently.

```
codemod -d . --extensions cc,cpp,cu,cuh,h,py,hpp,TARGETS GetDevicetype device_type
```

Reviewed By: jerryzh168

Differential Revision: D9813544

fbshipit-source-id: fe462f4bc40b03e74921f8cf5ebd9cfc52e7e636
2018-09-13 16:32:51 -07:00
c18510463b Reduce includes in tensor_impl.h (#11643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11643

- Reduce the tensor_impl.h includes to the bare
  minimum necessary
- Explicitly namespace std::

Reviewed By: jerryzh168

Differential Revision: D9811028

fbshipit-source-id: 44e32720962b35c12a7b2c93605721b9f6c5b254
2018-09-13 16:32:49 -07:00
8402fde279 Revert D9778043: Pass Storage by value
Differential Revision:
D9778043

Original commit changeset: b1381cd60a82

fbshipit-source-id: 40f1de67e939cb41605978d632105a48a91e7629
2018-09-13 16:32:48 -07:00
85ff72348d Only involve tensor device in CUDA -> CPU copy, not current device. (#11592)
Summary:
This also unifies the device usage between the async and sync case.

Fixes https://github.com/pytorch/pytorch/issues/10832.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11592

Differential Revision: D9797355

Pulled By: gchanan

fbshipit-source-id: e496cd371111cfaf9a6c664167967b395e3d72e9
2018-09-13 16:32:46 -07:00
4672280b55 Pass Storage by value (#11546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11546

-

Reviewed By: ezyang

Differential Revision: D9778043

fbshipit-source-id: b1381cd60a826055ce8771d6c67eac4cc375b3b4
2018-09-13 15:26:05 -07:00
05e06f7de2 migrating deprecated calls without abc module for containers (#11515)
Summary:
Implementing #10540.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515

Reviewed By: apaszke

Differential Revision: D9771045

Pulled By: jeffreyksmithjr

fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6
2018-09-13 15:09:22 -07:00
29e29ca6ee Use MPI_Isend/MPI_Irecv to back send/recv (#11630)
Summary:
The isCompleted function is changed to being non-const to accomodate
setting some internal status on the work object in the case of
completion. Previously, it was only checking a member field, but for the
MPI backend it calls MPI_Test to poll for completion of an asynchronous
request.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11630

Reviewed By: SsnL

Differential Revision: D9808008

Pulled By: pietern

fbshipit-source-id: 18b70825b1fb4d561a552fa75e9475a522852cd4
2018-09-13 15:01:24 -07:00
f129da1a47 Add max to the ValueError for EmbeddingBag mode check (#11655)
Summary:
Related to #11624
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11655

Differential Revision: D9815454

Pulled By: SsnL

fbshipit-source-id: 8dd82e0c0aa68362e12b301e095a85af7d7fd71a
2018-09-13 14:39:40 -07:00
90537289a0 Constexpr std::move / std::forward for C++11 (#11396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11396

std::move and std::forward in C++11 aren't constexpr (they are in C++14).
This caused a build issue orionr was working on.
It should be fixed by this diff

Reviewed By: orionr

Differential Revision: D9724805

fbshipit-source-id: 0d9047dce611385d659cc71a6c04cc7a6a40a5ae
2018-09-13 12:56:17 -07:00
0f1ca569ce End-to-end dynamic slicing with ONNX DynamicSlice experimental operator (#11255)
Summary:
Requires https://github.com/onnx/onnx/pull/1377

This PR makes it so that slices with dynamic boundary values can be exported from pytorch and run in caffe2 via ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11255

Differential Revision: D9790216

Pulled By: jamesr66a

fbshipit-source-id: 6adfcddc5788df4d34d7ca98341077140402a3e2
2018-09-13 12:39:52 -07:00
acb6f18bab fix generate_code.py caching (#11644)
Summary:
Currently, because of some setup.py logic, `ninja` caching of the `generate_code.py` build step was broken. This resulted in `generate_code.py` running every single time builds were happening, regardless of whether inputs changed.

This updated logic fixes the input caching
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11644

Reviewed By: orionr

Differential Revision: D9814348

Pulled By: soumith

fbshipit-source-id: 2012960908d0f600488d410094095cfd72adc34f
2018-09-13 12:39:48 -07:00
75f49befeb move instance_norm to aten (#10792)
Summary:
This also removes the usage of torch.onnx.symbolic_override in instance_norm. Fixes #8439.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10792

Differential Revision: D9800643

Pulled By: li-roy

fbshipit-source-id: fa13a57de5a31fbfa2d4d02639d214c867b9e1f1
2018-09-13 12:26:22 -07:00
912d3626c8 Split tensor.h into tensor_impl.h and tensor.h (#11642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11642

This is just a preparatory change to help with future
refactoring:

- I want to reduce the number of includes that tensor_impl.h
  depends on, but
- I need to keep tensor.h providing all Caffe2 headers, because
  users may be relying on tensor.h transitively providing those
  headers.

Introducing a level of indirection lets me do both at the same time.

Reviewed By: jerryzh168

Differential Revision: D9810823

fbshipit-source-id: 8dfaac4b8768051a22898be8fcaf787ecc57eb13
2018-09-13 12:26:20 -07:00
45e9ee096e Fix test_mnist_training_leaks_no_memory_cuda warning (#11639)
Summary:
Before this PR it would warn that "dropout is non deterministic and can
cause problems when checking trace", so I disabled the trace checking.

cc zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11639

Differential Revision: D9812493

Pulled By: zou3519

fbshipit-source-id: fab86928a5fba8b218b47543533aaf7c82a10b4a
2018-09-13 12:09:20 -07:00
9abc666745 stop allowing extra positional args in arg parser (#10499)
Summary:
Arg parser allowed additional positional args to be parsed into keyword-only params.

Fixes a couple cases:
- The positional argument happens to be of the right type, and it just works silently. Now, we fail as expected.
- The positional argument fails later down the line. Now, we fail at the appropriate time and get a better error message.

Pre-fix:
```
>>> torch.cuda.LongTensor((6, 0), 1, 1, 0)
tensor([6, 0], device='cuda:1')
```
Post-fix:
```
>>> torch.cuda.LongTensor((6, 0), 1, 1, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new() received an invalid combination of arguments - got (tuple, int, int, int), but expected one of:
 * (torch.device device)
 * (torch.Storage storage)
 * (Tensor other)
 * (tuple of ints size, torch.device device)
 * (object data, torch.device device)
```

Pre-fix:
```
>>> a = torch.tensor(5)
>>> a.new_zeros((5,5), 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new_zeros(): argument 'dtype' (position 2) must be torch.dtype, not int
```

Post-fix:
```
>>> a = torch.tensor(5)
>>> a.new_zeros((5,5), 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new_zeros() takes 1 positional argument but 2 were given
```

fixes #8351
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10499

Differential Revision: D9811093

Pulled By: li-roy

fbshipit-source-id: ce946270fd11b264ff1b09765db3300879491f76
2018-09-13 11:56:12 -07:00
6f53b4efea Remove implicit bool casts (#11503)
Summary:
In order to comply with Python's rules on implicit casting of
non-booleans to booleans, this PR removes implicit casting in favor of
explicit casts via `bool()`

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11503

Differential Revision: D9780869

Pulled By: driazati

fbshipit-source-id: c753acaca27f4e79dddf424c6b04674f44a6aad9
2018-09-13 11:26:45 -07:00
ab3a2d25fb Improve error messages when trying to use nested lists.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11606

Differential Revision: D9806949

Pulled By: zdevito

fbshipit-source-id: c38abc4ce745a63d26a64f6aa1b41350e4b1acd5
2018-09-13 11:10:38 -07:00
5bc90b8554 support conversion and dispatch of complex numbers (#11603)
Summary:
- Just a simple fix to support `fill_`
- And a fix for indexing in `pytorch-complex`

Differential Revision: D9804061

Pulled By: ezyang

fbshipit-source-id: 631129b3fa220a9670770b3766f14a8e03633bdf
2018-09-13 11:10:37 -07:00
a861573e36 fix tensor export bug in IR export (#11613)
Differential Revision: D9811094

Pulled By: li-roy

fbshipit-source-id: 012792dbedc70bd3fa242fdf2e39da0b21ce158d
2018-09-13 11:10:35 -07:00
d278344e36 Automatic update of fbcode/onnx to 39dd0d4fec5913aa517b71bcfcbf638a427894eb (#11622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11622

Previous import was bff0b8835870c7df7762ef43498d000d2d8ffb52

Included changes:
- **[39dd0d4](https://github.com/onnx/onnx/commit/39dd0d4)**: [build] Add ONNX_API for protos in all cases (#1407) <Orion Reblitz-Richardson>
- **[944db4f](https://github.com/onnx/onnx/commit/944db4f)**: cmake (#1401) <zrphercule>
- **[8ccc8dd](https://github.com/onnx/onnx/commit/8ccc8dd)**: Remove ONNXIFI_CHECK_RESULT from onnxRelease* functions (#1397) <Marat Dukhan>
- **[df14e74](https://github.com/onnx/onnx/commit/df14e74)**: Change onnxifi test driver classname (#1396) <zrphercule>
- **[0c885cc](https://github.com/onnx/onnx/commit/0c885cc)**: ONNXIFI cpp test driver (#1290) <zrphercule>
- **[a557848](https://github.com/onnx/onnx/commit/a557848)**: Coverage Report Tools for Backend Scoreboard (#1301) <Akshay Chalana>
- **[31fd87f](https://github.com/onnx/onnx/commit/31fd87f)**: fix AvgPool doc. add default value for count_include_pad (#1391) <Wenhao Hu>
- **[8ff08c2](https://github.com/onnx/onnx/commit/8ff08c2)**: Do not export onnx symbols in the python extension (#1388) <bddppq>

Reviewed By: orionr

Differential Revision: D9806635

fbshipit-source-id: f61c052b6bd14e0c80ace19c1a5f0ba659030c6f
2018-09-13 10:40:48 -07:00
1f49b879d1 Add missing include for __half (#11638)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11638

Differential Revision: D9811063

Pulled By: ezyang

fbshipit-source-id: dd103bb152485bcdbb0108b4d3de2443c30d5572
2018-09-13 10:33:09 -07:00
d4d72b87e3 Sphinx is case sensitive
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11646

Differential Revision: D9811355

Pulled By: SsnL

fbshipit-source-id: d484561baa2ac5b3113870b4ee06fa3560b686e4
2018-09-13 10:33:06 -07:00
57f149a861 Only join pin_memory_thread after it started (#11599)
Summary:
Same reason as in #11432 .

Example error:
```
Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28>
Traceback (most recent call last):
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__
    self._shutdown_workers()
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers
    self.pin_memory_thread.join()
AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599

Differential Revision: D9801143

Pulled By: SsnL

fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e
2018-09-13 09:40:49 -07:00
36fc1a0a58 Merge caffe2::/at::Storage
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11637

Reviewed By: gchanan

Differential Revision: D9806425

Pulled By: ezyang

fbshipit-source-id: e20ec93bff6dc7fb22ca9b7e7348d060b3876b67
2018-09-13 09:40:48 -07:00
77f6998e54 Guard against inputting or returning sparse tensors (#11550)
Summary:
Add guards against using sparse tensor by checking the conversion from IValue -> PyObject & PyObject -> IValue.

This diff also changes the behavior in constant propagation to not run python ops even if all ops are constant because of possible mutation to global state. This came up in trying to run get_sparse(), and I'm including it here to make it easier to land.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11550

Differential Revision: D9804712

Pulled By: eellison

fbshipit-source-id: 9fe7daf721c6d6e48df4925c0f9c775873bcdc77
2018-09-13 08:58:29 -07:00
cac11a4ac3 Merge caffe2::/at::StorageImpl (#11543)
Summary:
Merges caffe2::StorageImpl methods with at::StorageImpl methods and defines caffe2::StorageImpl as at::StorageImpl.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11543

Differential Revision: D9795228

Pulled By: cpuhrsch

fbshipit-source-id: fbd6fa3cbf6c9099a4803337286c30e00652f95c
2018-09-13 01:25:50 -07:00
44b2b6b150 clean up jit generated tests (#11403)
Summary:
Clean up some generated tests after we have newly nice features like var args.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11403

Differential Revision: D9800545

Pulled By: wanchaol

fbshipit-source-id: e9973b113f78dc38cf99a81b6ede3fa3485f1cfa
2018-09-12 22:55:03 -07:00
e998038bc0 Use TypeMeta instead of TypeIdentifier within at::StorageImpl (#11236)
Summary:
Further aligns at::StorageImpl with caffe2::StorageImpl
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11236

Differential Revision: D9776286

Pulled By: cpuhrsch

fbshipit-source-id: f2c53995fcece013b77b3a1f709ab0f9df8ab23e
2018-09-12 22:26:00 -07:00
6f05b5ee54 Pin Sphinx to 1.7.9 (#11620)
Summary:
Sphinx 1.8.0 breaks us.  Upgrading is tracked in #11618.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11620

Differential Revision: D9806440

Pulled By: ezyang

fbshipit-source-id: 7a8d849c78e697a8775d00cd3a463a7bdbcddabe
2018-09-12 21:55:21 -07:00
17637f2b03 enable_mkl support for resnet18+lstm model
Summary:
* Many op in lstm part of the model don't have implementation in ideep/mkl, and it doesn't make sense to copy back and forth for the few available ops because majority of RNN will be on CPU
* Thus the strategy is to enable mkl only for the resnet18 part of the model, then switch to default cpu engine for the lstm part

* The net may contain some external_inputs falsely added during ONNX->Caffe2. Canary in service shows their existence could leads to service crash (presumably due to these blob somehow get shared between threads). They're now manually removed which seem to be enough to avoid the crash.

Reviewed By: viswanathgs

Differential Revision: D8888763

fbshipit-source-id: da7761bcb7d876ff7bbb6640ae4b24712c0b1de6
2018-09-12 18:56:46 -07:00
0a6931cfee Only reference ONNX through onnx_pb.h (#11609)
Summary:
I think this is needed to land https://github.com/onnx/onnx/pull/1407 without CI errors.

cc mingzhe09088 houseroad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11609

Reviewed By: houseroad

Differential Revision: D9803490

Pulled By: orionr

fbshipit-source-id: 26193f38ab0a2eef9ad7d0da9a0310dc40ef0f2d
2018-09-12 18:25:58 -07:00
5da0b31bee More native docs on TensorOptions. (#11558)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11558

Differential Revision: D9783655

Pulled By: ezyang

fbshipit-source-id: 17c749c9ef99fd9dfd0ff365ebfe22102fb891d7
2018-09-12 17:39:39 -07:00
f00f99ebcc use at::Half in THC (#11322)
Summary:
- use Half instead of half in THC
- clean up TH_float2half, TH_half2float, etc. conversions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11322

Differential Revision: D9799553

Pulled By: li-roy

fbshipit-source-id: 9aa3e003bff73d9df6224a393f3ec0624b1f44ed
2018-09-12 17:39:37 -07:00
daa379ffd7 Disable flaky test ObserverTest.TestMultipleNetBase (#11596)
Summary:
Tracked in https://github.com/pytorch/pytorch/issues/9137

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11596

Differential Revision: D9803256

Pulled By: ezyang

fbshipit-source-id: 973393203ed8343a3a0feef36d34e561d9f653c4
2018-09-12 17:39:36 -07:00
e2cd627cce Temporarily disable docs build. (#11608)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11608

Differential Revision: D9803369

Pulled By: ezyang

fbshipit-source-id: a206d6137e8e729f702189c926ec898444d1dc53
2018-09-12 17:39:34 -07:00
7f7cda99cd Optimize order_swich_ops on GPU (#11404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11404

Optimize order_swich_ops on GPU

Reviewed By: houseroad

Differential Revision: D9728642

fbshipit-source-id: 74ff62268856fb1613fa61eb214bed6ec6716632
2018-09-12 16:56:15 -07:00
776a9992e1 topk test fix, hgemm integration (#11593)
Summary:
After discussions in #11584 , new PR for just the test skip and hgemm integration.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11593

Differential Revision: D9798527

Pulled By: ezyang

fbshipit-source-id: e2ef5609676571caef2f8e6844909fe3a11d8b3e
2018-09-12 16:56:13 -07:00
def44c96fd Revert D9779866: [pytorch][PR] Move function deletion from the stack to the heap.
Differential Revision:
D9779866

Original commit changeset: 96753eead790

fbshipit-source-id: 959deeb63318d48f4c563e10e70ef6ec7fabd3b4
2018-09-12 16:56:11 -07:00
5b2efcf425 Document the Conv module (#11566)
Summary:
Document the C++ API conv module. No code changes.

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11566

Differential Revision: D9793665

Pulled By: goldsborough

fbshipit-source-id: 5f7f0605f952fadc62ffbcb8eca4183d4142c451
2018-09-12 16:56:09 -07:00
130d55a5f4 Allow building the C++ API without cereal (#11498)
Summary:
I am working on unifying the C++ extensions and C++ API, and one constraint for this is that we will want to be able to build the C++ API without cereal, since we won't want to ship it with the Python `torch` package.

For this I introduce a `TORCH_WITH_CEREAL` option to CMake. If on, the C++ API will be built with cereal and thus serialization support. If off, serialization functions will throw exceptions, but the library will otherwise still compile the same. __This option is on by default, so for regular C++ API users nothing will change__. However, from C++ extensions, we'll be able to turn it off. This effectively means we won't be searching for any cereal headers from C++ API headers, which wouldn't be installed in the Python package.

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11498

Differential Revision: D9784803

Pulled By: goldsborough

fbshipit-source-id: 5d0a1f2501993012d28cf3d730f45932b483abc4
2018-09-12 16:56:07 -07:00
12efef166a Split out copy_op from utility_ops (#11470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11470

In order to reduce build sizes, we are identifying files that can be split up into smaller units, allowing us to only include the ops we need.

Reviewed By: orionr, ajtulloch

Differential Revision: D9725819

fbshipit-source-id: def1074a33dffe99bd6a7e6e48aa9e5be3d04a6a
2018-09-12 16:25:48 -07:00
316c167940 Add checking of nullptrs in GetTensorInfo (#11587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11587

To help debug the issue in T33295362, we add some checks in the function.

Possible crashing site in `GetTensorInfo`
1. tc is nullptr, which is checked.
2. tc->capacity_nbytes() hits nullptr, this is unlikely because storage is not a pointer and compute of capacity_nbytes doesn't involve pointers. It's numel * itermsize().
3. tc->ExtractDeviceOption hits nullpt. One possibility raw_data() is nullptr because tc->ExtractDeviceOption will use that. This is checked.
4. Tensor itself which is not a reference. This is also checked.

Reviewed By: salexspb

Differential Revision: D9793484

fbshipit-source-id: 3fc72746fc310a23ae45553bbe0d269a4b9edb72
2018-09-12 16:25:46 -07:00
eb7a298489 Add resnext model to OSS (#11468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11468

Add resnext model into OSS Caffe 2 repo.

Reviewed By: orionr, kuttas

Differential Revision: D9506000

fbshipit-source-id: 236005d5d7dbeb8c2864014b1eea03810618d8e8
2018-09-12 15:59:20 -07:00
c81406c514 Document Any (#11580)
Summary:
Documents the `AnyModule` class in the C++ API.

Also changed the API to be friendlier by default. Calling `AnyModule::forward` used to return an `AnyModule::Value` which you had to call `.get<T>()` on to cast to a concrete type. I changed the name of that `forward` method to `any_forward` and instead made `forward` templated on a `ReturnType` template parameter which you can supply to do the `.get<T>` cast for you automatically. I default this parameter to `torch::Tensor` so that it can often be omitted. So where you used to have to write

```cpp
any_module.forward(...).get<int>();
any_module.forward(...).get<torch::Tensor>();
```

you now write

```cpp
any_module.forward<int>(...);
any_module.forward(...);
```

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11580

Differential Revision: D9798626

Pulled By: goldsborough

fbshipit-source-id: 060b4ea28facaffc417f53b80b846a9dff9acb73
2018-09-12 15:59:19 -07:00
ac94889939 Add jit doc entry to sidebar (#11598)
Summary:
cc zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11598

Differential Revision: D9801230

Pulled By: SsnL

fbshipit-source-id: f0c8d2468b64a50c3c437667d462722dcd2682d1
2018-09-12 15:29:23 -07:00
b663b7ce7e Update ROCm Docker image with latest AMD debians (#11507)
Summary:
Building at https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/194/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11507

Differential Revision: D9772474

Pulled By: ezyang

fbshipit-source-id: ab00f05744547dc7ec9f97511e2c8495ac282fac
2018-09-12 15:29:21 -07:00
02c4cd3c8a Skip flaky distributed tests (#11594)
Summary:
context: https://github.com/pytorch/pytorch/issues/11582

cc pietern The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11594

Differential Revision: D9798871

Pulled By: SsnL

fbshipit-source-id: 9f9e1871c7fd9505ca898865eb8068fab4d3416d
2018-09-12 14:57:57 -07:00
d4e05f4e1e Move function deletion from the stack to the heap. (#11534)
Summary:
This eliminates the need for any heuristics regarding stack size limits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11534

Differential Revision: D9779866

Pulled By: resistor

fbshipit-source-id: 96753eead7904bbdc2869fb01f7bd42141032347
2018-09-12 14:39:59 -07:00
958ba4e913 Aibench for asr decoder
Summary: as title

Reviewed By: sf-wind

Differential Revision: D9738021

fbshipit-source-id: 98f570484bca6486ad99207732efd534ec7e3251
2018-09-12 14:25:19 -07:00
f0a440007e Explicitly set locale on docs build. (#11595)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11595

Differential Revision: D9798567

Pulled By: ezyang

fbshipit-source-id: ac05458347e181960a07cacae1dfc68d2837451f
2018-09-12 14:11:24 -07:00
504126e705 Documentation for debugging JIT
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11540

Differential Revision: D9798647

Pulled By: jamesr66a

fbshipit-source-id: 968a4af22c735a848fa27cbadaed9b7023ba8276
2018-09-12 14:11:22 -07:00
a3036b3bb3 Fused weightnorm for ATen (#10842)
Summary:
This PR contains a C++ implementation of weight norm.  The user-side exposure of weight norm through torch.nn.utils.weight_norm is unchanged.

If running on the GPU, and the norm is requested over the first or last dimension of the weight tensor, the forward pass is carried out using the fused kernels I wrote for our Fairseq GTC hero run, which offer superior performance to primitive ops and superior numerical stability when running in FP16.  In the common case that the backward pass is not itself constructing a graph (ie not attempting to set up double backward) the backward pass will be carried out using another fused kernel.  If the backward pass is constructing a graph, an alternate code path is taken, which does the math using differentiable primitive ops. In this way, the implementation allows double backward, even if the fused kernel was used in forward (although in this case, you don't benefit from the performance and stability of the fused backward kernel).

If running on the CPU, or if norming over an interior dim, the forward pass is carried out using double-differentiable primitive ops.

Figuring out how to generate all the right plumbing for this was tricky, but it was a fun experience learning how the autogenerator works and how the graph is constructed.  Thanks to colesbury for useful guidance on this front.

I do have a few lingering questions:

- Should I unify my return statements (ie by default-constructing Tensors outside if blocks and using operator= within)?
- What is the significance of `non_blocking` when calling e.g. `auto norms = saved_norms.to(saved_g.type().scalarType(), non_blocking=True/False);`?  I am currently omitting `non_blocking`, so it defaults to False, but I didn't see any associated synchronizes on the timeline, so I'm wondering what it means.
- Is there an "official" mapping from at::ScalarTypes to corresponding accumulate types, as there are for the PODs + Half in [AccumulateType.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h)?  I looked for an equivalent mapping for ScalarTypes, didn't find one, and ended up rigging it myself (`  at::ScalarType AccType = g.type().scalarType() == at::ScalarType::Half ? at::ScalarType::Float : g.type().scalarType();`).
- Are sparse tensors a concern?  Should I include another check for sparse tensors in the `_weight_norm` entry point, and send those along the fallback CPU path as well?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10842

Differential Revision: D9735531

Pulled By: ezyang

fbshipit-source-id: 24431d46532cf5503876b3bd450d5ca775b3eaee
2018-09-12 13:55:27 -07:00
9a7c196040 Move Type, Tensor, TensorMethods to core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11519

Reviewed By: yf225

Differential Revision: D9771684

Pulled By: gchanan

fbshipit-source-id: a57ee2072af99ce856f895c688b09d750a8606e0
2018-09-12 13:10:54 -07:00
739e6af869 Add reminder % to the jit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11557

Reviewed By: apaszke

Differential Revision: D9784642

Pulled By: wanchaol

fbshipit-source-id: b7c60c3e9534555c9d7db83769965b3f2f277cdf
2018-09-12 12:40:38 -07:00
ad7936e108 Fix reloading modules back into python (#11552)
Summary:
This changes the way module import works so that when a module
is reloaded in python it becomes a ScriptModule and not a _C.ScriptModule
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11552

Differential Revision: D9782751

Pulled By: zdevito

fbshipit-source-id: 9576850b75494b228ce3def94c0d371a4a44b11d
2018-09-12 12:25:15 -07:00
17e76e26c8 Add trigonometry functions to docs/source/onnx.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11581

Differential Revision: D9794449

Pulled By: soumith

fbshipit-source-id: 1218fcf8969a10ffbfefd3ced7fee9fe7df296f1
2018-09-12 12:10:01 -07:00
13b05c8c78 Add EndToEndHybridModel CUDA tests (#11544)
Summary:
Also adds two additional tests that check for memory leaks while the relevant graph executors are alive:
- (minimal test): Create a ScriptModule, keep it alive, and test that it does not leak memory while it is alive
- (large test) Do MNIST training with a traced MNIST module and test that no memory is leaked while the traced module (with graph executor) is alive

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11544

Reviewed By: apaszke

Differential Revision: D9778479

Pulled By: zou3519

fbshipit-source-id: 2d6cdea81dd1264f2c0396b662f70fdafecb3647
2018-09-12 11:25:18 -07:00
23d55883c0 minor formatting error log (#11528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11528

as title

Reviewed By: chocjy

Differential Revision: D9773214

fbshipit-source-id: b7dd4c19ab83a18f344de8e71ce5b3bf74d1af72
2018-09-12 11:25:17 -07:00
6398d626f4 Warn that export+import module always load onto the CPU (#11485)
Summary:
Test Plan
`cd docs && make html`
![image](https://user-images.githubusercontent.com/5652049/45325074-ed04e480-b51d-11e8-9d2d-685dbe8a08e9.png)

cc zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11485

Differential Revision: D9772119

Pulled By: zou3519

fbshipit-source-id: 3dcb16c9edc2e8deebef17accf91a1c7d4dc9063
2018-09-12 10:55:39 -07:00
12f4c46eea caffe2::StorageImpl use at::DataPtr (#11282)
Summary:
See title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11282

Reviewed By: ezyang

Differential Revision: D9658503

Pulled By: cpuhrsch

fbshipit-source-id: 42fa73c979692cb1069c0345744a85d12150745c
2018-09-12 09:39:23 -07:00
e5dd77c7ad Sync all libnccl soversions, not just libnccl.so.1 (#11575)
Summary:
Fixes:

```
/bin/ld: warning: libnccl.so.1, needed by /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so, not found (try using -rp
ath or -rpath-link)
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllReduce'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclBcast'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclCommInitAll'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclGetErrorString'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduceScatter'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllGather'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduce'
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11575

Differential Revision: D9789956

Pulled By: ezyang

fbshipit-source-id: 63e48763cc233be9d137cec721b239159b511a24
2018-09-12 09:24:51 -07:00
f0a284502a Document BatchNorm and update default behavior (#11484)
Summary:
This PR:

1. Documents `BatchNorm`,
2. Makes a number of API changes after reconsidering some quirks:
    1. The default value for the `stateful` parameter used to be `false`, but the most common usage of `BatchNorm` out of the wild is certainly stateful, and the default in Python is also statefulness. So we change the default to stateful.
    2. The `pure_forward` function used to use the internal running mean and variance variables instead of the ones supplied to that function call when `stateful` was true, which certainly seems odd. When you call `pure_forward` you would certainly expect the values you pass explicitly to be used. This is now fixed.
3. Adds tests for `BatchNorm`, finally.

ebetica apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11484

Reviewed By: pjh5

Differential Revision: D9779618

Pulled By: goldsborough

fbshipit-source-id: 59ba760e085c01454b75644b24b22317b688e459
2018-09-12 09:09:53 -07:00
6fc18a7541 Typo fix in randomness.rst (#11571)
Summary:
"need to be" -> "need not be"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11571

Differential Revision: D9786001

Pulled By: soumith

fbshipit-source-id: 7cc408f5c8bfcc56d4b5c153646f30e1cec37539
2018-09-12 08:25:46 -07:00
efc0f6784a Move some bmm/baddbmm to ATen (#11292)
Summary:
- Incorporates MKL addition by mingfeima  Thank you! (but all errors are my own)
- Native CPU implementation: defer to matrix multiplication for
  small batches and parallelize over batch dimension for large
  batches.
- Add bmm test for CUDA just to be sure.

This is a partial fix for #10661, getting down to a factor ~5.
Considerable overhead is incurred for the setup in einsum. It might
be more efficient to eventually define an optimized contraction
functions for arbitrary and several dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11292

Differential Revision: D9784941

Pulled By: ezyang

fbshipit-source-id: f6dded2c6f5e8f0461fb38f31f9a824992a58358
2018-09-12 07:09:55 -07:00
76070fe73c Make c10d test work on CPU only build (#11567)
Summary:
Make test work with CPU only build, also fixed the test failures for a long time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11567

Differential Revision: D9785740

Pulled By: teng-li

fbshipit-source-id: 61c43b758c1ee53117e30de8074583e6faea863a
2018-09-12 01:39:44 -07:00
6597779847 Clean up some C++ cruftiness in the script lexer.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11408

Differential Revision: D9772843

Pulled By: resistor

fbshipit-source-id: 07f16bf7eaf4f1d8700e46e91a485de4b2d9ed83
2018-09-11 23:55:31 -07:00
3e3d8caecd Allow setting deletion constant
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11529

Differential Revision: D9775398

Pulled By: goldsborough

fbshipit-source-id: 8593d1afcf8be3150dcc4a58433f53307e3ae665
2018-09-11 23:11:46 -07:00
6dcdbd3a1d Make C10d support CPU only build (#11513)
Summary:
This makes torch.distributed works for CPU only build.

Also added one more CI test case to cover MPI CPU build.
All CI tests should cover this change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11513

Differential Revision: D9784546

Pulled By: teng-li

fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7
2018-09-11 22:10:34 -07:00
90e31f4896 Improve tracer warnings (#11545)
Summary:
Also, fix a performance bug in `ensureUnique`. Previously it formatted the warning string even though we weren't tracing, so all that work would *always* happen in the hot path and be for nothing.

A sample of how the new warnings look like:
```
tmp.py:4: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Pytho
n values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  int(x)
tmp.py:5: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this fun
ction to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might caus
e the trace to be incorrect.
  torch.tensor([1.])
tmp.py:6: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator add_. This might cause t
he trace to be incorrect, because all other views that also reference this data will not not reflect this change in the trace! On the other ha
nd, if all other views use the same memory, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  torch.split(y, 2, dim=1)[0].add_(2)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11545

Differential Revision: D9782975

Pulled By: apaszke

fbshipit-source-id: 5b3abd31366e59c69e0b7ff278042b5563deb5a9
2018-09-11 22:10:32 -07:00
62c9d4ac96 Make .to() methods native functions (to fix JIT tracing)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11491

Differential Revision: D9771121

Pulled By: apaszke

fbshipit-source-id: 08d11101fb12093f8cf913b06359adddf3af9da7
2018-09-11 21:55:42 -07:00
a00fa2c614 Release GIL when calling into JIT interpreter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11541

Differential Revision: D9777909

Pulled By: apaszke

fbshipit-source-id: d0217e203721262f3f131b54ea78f898df0b54ec
2018-09-11 21:55:40 -07:00
1a246c9c7e guard spurious cudnn.h include (#11562)
Summary:
This fixes the build when CuDNN was not found on the system.

From the `git blame`, it looks like the bug has been around for 2 years :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11562

Differential Revision: D9784589

Pulled By: soumith

fbshipit-source-id: b33153436dced0a503c9833cdf52f7093f3394b4
2018-09-11 21:09:54 -07:00
a11ebfa195 Add explicit "this->" for nvcc. (#11196)
Summary:
Fix #11195
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11196

Differential Revision: D9737625

Pulled By: ezyang

fbshipit-source-id: fb62076f005bd619eba53c0ed3f07683633f6d91
2018-09-11 21:09:52 -07:00
8aa8ad8b01 WIP: Reproducibility note (#11329)
Summary:
This adds a Note on making experiments reproducible.

It also adds Instructions for building the Documentation to `README.md`. Please ping if I missed any requirements.

I'm not sure what to do about the submodule changes. Please advise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11329

Differential Revision: D9784939

Pulled By: ezyang

fbshipit-source-id: 5c5acbe343d1fffb15bdcb84c6d8d925c2ffcc5e
2018-09-11 21:09:51 -07:00
b75c32ded9 link against TORCH_CUDA_LIBRARIES
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11475

Differential Revision: D9784616

Pulled By: anderspapitto

fbshipit-source-id: bb8b443bcb308bbbe9707d265f21e5d00d717d65
2018-09-11 20:39:53 -07:00
f4d9f39a94 Test libtorch on cuda
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11554

Differential Revision: D9784772

Pulled By: goldsborough

fbshipit-source-id: c3e071695f56c1f427984f427b1f7722722947d3
2018-09-11 20:39:51 -07:00
35348dab10 WIP: Include note on cudnn determinism in each function backed by cudnn (#11434)
Summary:
Ping ezyang
This addresses your comment in #114. Strangely, when running the doc build (`make html`) none of my changes are actually showing, could you point out what I'm doing wrong?

Once #11329 is merged it might make sense to link to the reproducibility note everywhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11434

Differential Revision: D9751208

Pulled By: ezyang

fbshipit-source-id: cc672472449564ff099323c39603e8ff2b2d35c9
2018-09-11 20:27:09 -07:00
54107ae8cf convert output_device at data_parallel from torch.device to index (#10189)
Summary:
- fixes #9984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189

Differential Revision: D9545390

Pulled By: weiyangfb

fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e
2018-09-11 20:27:07 -07:00
045f862574 Use torch::nn::init::xavier_normal_
Summary: The PyTorch C++ API has `torch.nn.init` equivalents that the RNNG can use to initialize the state of its StackRNNs. This gets rid of the `fanInOut_` methods on `Parser` and tidies up `xavierInitialState` a little.

Reviewed By: wowitsmrinal

Differential Revision: D9472595

fbshipit-source-id: c202116f32383d3b4bba064c2c0d2656311e1170
2018-09-11 20:27:06 -07:00
d95fedb436 Use ATen dropout implementation in Dropout module and add FeatureDropout (#11458)
Summary:
This PR does two things:
1. Replaces the implementation of the `Dropout` module with a call to the ATen function,
2. Replaces `Dropout2d` with a new `FeatureDropout` module that shall take the place of `Dropout2d` and `Dropout3d`. I contemplated calling it `Dropout2d` and making `Dropout3d` an alias for it, but similar to our decision for `BatchNorm{1,2,3}d` (c.f. https://github.com/pytorch/pytorch/pull/9188), we can deviate from Python PyTorch in favor of the ideal-world solution, which is to have a single module, since both actually just call `feature_dropout`.

I also replaced the implementation of `dropout3d`  with a call to `dropout2d` in Python. The code is the same and it's easier for developers to parse than having to manually match the tokens to make sure it's really 100% the same code (which it is, if I matched the tokens correctly).

ebetica ezyang SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11458

Differential Revision: D9756603

Pulled By: goldsborough

fbshipit-source-id: fe847cd2cda2b6da8b06779255d76e32a974807c
2018-09-11 20:16:12 -07:00
3121c8f526 Update gtest and remove the macro guide on gtest from #11321 (#11417)
Summary:
Last PR seems to have test failures, re-issuing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11417

Reviewed By: orionr

Differential Revision: D9784706

Pulled By: Yangqing

fbshipit-source-id: 9e5f347e19fa2700ff69d2cd69ea7a9e01a91609
2018-09-11 20:16:08 -07:00
92fd69f256 Split Type into TypeExtendedInterface and Type (#11520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11520

Previously, we had Type which was a catch all interface for all
functions and methods we could possibly want to do dynamic dispatch
on. However, we want to check in a non-autogenerated Tensor class
to ATen/core, and to do this, we must also check in a non-autogenerated
Type class which we can do dispatch on. In principle, we could
put the full Type interface in ATen/core, but this would be
a bad developer experience, since any time you add a new free
function, you'd have to regenerate the checked in Type header.

For a better dev experience, we split Type into a two parts,
Type, which will be checked in (though not in this diff), and
TypeExtendedInterface, which will NOT be checked in. Type contains
just enough methods to let Tensor be defined, and leaves the
rest to TypeExtendedInterface.

Some complications:

- We (very unfortunately) have overloaded virtual methods. Because
of C++'s rules, we cannot move one overload without doing some
extra work to make sure that overload in a superclass and an
overload in a subclass resolve together. I've chosen to resolve
this problem simply by moving ALL overloads of a method which
occurs in Tensor to Type.

- There are some places where we take a type() object and call
a method on it, which is not a Tensor base method. I've eliminated
some where possible, but in other cases calling the method on type
is the ONLY way to invoke it; in that case, I've just inserted
a cast. Further refactoring is necessary.

Reviewed By: gchanan

Differential Revision: D9771708

fbshipit-source-id: c59d39fe919cd6f42be6dca699d474346ea3c614
2018-09-11 20:16:04 -07:00
35d52dbb0e re-enable USE_MPI (#11416)
Summary:
The previous error was caused by mpi_test not depending on MPI_CXX_LIBRARIES. This might solve the problem.

Not tested locally - waiting for CI test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11416

Reviewed By: mingzhe09088

Differential Revision: D9771694

Pulled By: Yangqing

fbshipit-source-id: 53e7b4f64eadc88313bc4dd9b8e3f7931cda6e91
2018-09-11 18:26:12 -07:00
bbf54ea37c Ensure .enumerate_support() methods are jittable (#11542)
Summary:
This works around #11535 by avoiding `arange(n, out=x)` and `eye(n, out=x)` in `torch.distributions`. I've confirmed that the `.enumerate_support()` methods are now jittable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11542

Differential Revision: D9777805

Pulled By: apaszke

fbshipit-source-id: fa38f2f1acfc0a289f725fd8c92478573cfdbefb
2018-09-11 18:26:09 -07:00
cda74ac476 fix nested no_grad decorator and with-statement (#11479)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10858
- allow `no_grad` decorator to apply `with torch.no_grad()` at the correct context
- current behavior:
```
import torch

torch.no_grad()
def nothing(x):
    return x

testin = torch.Tensor([0])
with torch.no_grad():
    print(torch.is_grad_enabled()) # False
    testout = nothing(testin)
    print(torch.is_grad_enabled()) # False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11479

Differential Revision: D9758691

Pulled By: weiyangfb

fbshipit-source-id: 87de2219c6c45f65a2c0406ae152c3ad760be8f2
2018-09-11 17:56:40 -07:00
8b196d671b Allow tracing random functions (only when using default generators) (#11539)
Summary:
Fixes #11504.

zdevito, neerajprad, fritzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11539

Differential Revision: D9777897

Pulled By: apaszke

fbshipit-source-id: 56983260f5b93da7d5540a6242769ea7bd50eb06
2018-09-11 17:56:39 -07:00
b6b0b5222d fix missing libnccl.so.1 error (#11553)
Summary:
what it says on the tin.

I broke the build in https://github.com/pytorch/pytorch/pull/11487 but contbuild didn't end up catching it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11553

Differential Revision: D9781557

Pulled By: soumith

fbshipit-source-id: 2a1fa314af4b85b5491d74110bfee3d80599aa95
2018-09-11 17:25:58 -07:00
3a39006d38 Fix some more doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11531

Differential Revision: D9776541

Pulled By: SsnL

fbshipit-source-id: 8725485639ea6e9479b6ea95a49f5b75a9457db7
2018-09-11 16:26:55 -07:00
3a8e39b215 Support load and store between Py_complex and std::complex (#11493)
Summary: Printing for complex numbers requires loading and storing between `Py_complex` and `std::complex`. This patch aims to support this for the plugin.

Differential Revision: D9771808

Pulled By: ezyang

fbshipit-source-id: 024865f1945d63ddb5efc775a35438c8ea06408e
2018-09-11 15:55:11 -07:00
289a8c9b7d Allow train/eval, and non-Tensor arguments to python functions (#11505)
Summary:
This whitelists train/eval functions in script modules, and tests that nested nn.Modules still work.

This also changes the code for calling python functions from script to allow non-tensor inputs/outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11505

Differential Revision: D9765466

Pulled By: zdevito

fbshipit-source-id: 1177bff931324422b69e18fa0bbaa82e3c98ec69
2018-09-11 15:05:09 -07:00
17776db2ee Add gtest dependency on aten tests. (#11429)
Summary:
ezyang delivering my promise to you :)

Basically, now aten tests can use gtest as part of our test harness unification effort. I also converted one test (atest.cpp) to show how one can do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11429

Reviewed By: ezyang

Differential Revision: D9762934

Pulled By: Yangqing

fbshipit-source-id: 68ec3a748403c6bd88399b1e756200985a4e07e3
2018-09-11 13:39:51 -07:00
4db21a1d8e Optimize LengthsTileOp on GPU to run a kernel instead of a sequence of memcopies (#11413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413

LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result.

Reviewed By: manojkris, xianjiec

Differential Revision: D9724988

fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900
2018-09-11 13:25:35 -07:00
c1dce21fd5 Cuda TensorAccessor (#11373)
Summary:
Provide a TensorAccessor-Like interface for CUDA as discussed in #8366.

Compared to TensorAccessor
- the CUDATensorAccessor copies the sizes and strides while on the host (I didn't implement a host indexing function, though) to enable transfer to the device, on the device, `[]` works like for TensorAccessors,
- instantiation is from TensorAccessors in order to allow using `.accessor<..>`. The drawback is that it you cannot use `auto` for the variable declaration, but the alternative would be a cuda-specific `.accessor`-like function,
- there is a PtrTraits argument to enable `__restrict__`,

Example for the intended use:
```
...
template <typename scalar_t>
__global__ void
apply_homography_2d_kernel(cuda::CUDATensorAccessor<scalar_t, 4> dest_a,
			   cuda::CUDATensorAccessor<scalar_t, 4> src_a,
			   cuda::CUDATensorAccessor<float, 2> transform) {
...
}

template <typename scalar_t>
Tensor apply_homography_2d_template(Tensor& res, const Tensor& image, const Tensor& transform) {
  ...
  cuda::CUDATensorAccessor<scalar_t, 4> image_a(image.accessor<scalar_t, 4>());
  cuda::CUDATensorAccessor<scalar_t, 4> res_a(res.accessor<scalar_t, 4>());
  cuda::CUDATensorAccessor<float, 2> transform_a(transform.accessor<float, 2>());
  auto stream = at::cuda::getCurrentCUDAStream();

  apply_homography_2d_kernel<scalar_t>
    <<<grid, block, 0, stream>>>(res_a, image_a, transform_a);
  return res;
}

...
```

I could use a hint where to put a test for this (e.g. doing a plain vanilla matrix multiplication with a custom kernel) and comparing with the aten mm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11373

Differential Revision: D9735573

Pulled By: ezyang

fbshipit-source-id: 482b218a0d514e19a8b692bbc77c0e37082cfded
2018-09-11 13:09:33 -07:00
c56a7cfc37 More use of AT_CHECK and AT_ERROR (#11457)
Summary: Considering these increase the size of the message stack, I didn't touch the code outside `ATen/native`

Differential Revision: D9754283

Pulled By: soumith

fbshipit-source-id: 04198ec4fd0c4abae09eeba92c493a783408537a
2018-09-11 12:55:09 -07:00
5952acc041 Add "merge to master" step before build in CircleCI (#11443)
Summary:
This PR adds the "merge to master" step before the build step in CircleCI, so that all PR commits are built against master instead of against the PR's branch. Note that all PRs still need to rebase to master to pick up this new config, so it won't apply to old PR branches retroactively.

To check in CI: make sure it's performing the git merge to master appropriately in "Merge Onto Master" step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11443

Differential Revision: D9775628

Pulled By: yf225

fbshipit-source-id: 8083db6b098d234a44ae4481f40a486e9906f6f8
2018-09-11 12:39:37 -07:00
fbc17321fd Update pybind11 to fix Python 3.7 support for script (#11473)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11419

In particular pulling in https://github.com/pybind/pybind11/pull/1454
as well as pending bugfix in https://github.com/pybind/pybind11/pull/1517 (documenting in comment)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11473

Differential Revision: D9776003

Pulled By: jamesr66a

fbshipit-source-id: a225dcfb66c06bcae98fd2508d9e690c24be551a
2018-09-11 12:39:36 -07:00
781737f84c Remove time prefix from rsync (#11525)
Summary:
This fails with zsh saying "time: command not found".

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11525

Differential Revision: D9772522

Pulled By: apaszke

fbshipit-source-id: b80d108fa6b174d68ada08a9fdbf7260ee37e08f
2018-09-11 12:10:24 -07:00
a566bc2f11 Disable all CircleCI jobs (#11523)
Summary:
Disable all CircleCI jobs until we are ready to move forward with them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11523

Differential Revision: D9774462

Pulled By: yf225

fbshipit-source-id: c5724e71eb68bac4df958b4f7bcc380050668b3c
2018-09-11 11:25:17 -07:00
d09041bd81 Add an option to statically link cuda (#10596)
Summary:
Need to link CUDA statically for benchmarking purpose.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10596

Reviewed By: llyfacebook

Differential Revision: D9370738

Pulled By: sf-wind

fbshipit-source-id: 4464d62473e95fe8db65b0bd3b301f262bf269bf
2018-09-11 11:09:29 -07:00
727a4453aa New Serialization Proto
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11166

Reviewed By: mingzhe09088

Differential Revision: D9623522

Pulled By: houseroad

fbshipit-source-id: f21153034a398de7959404321d8534234cd58a40
2018-09-11 10:55:43 -07:00
f80f15866b Get rid of manual dispatch on Type. (#11486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11486

I discovered these by narrowing the interface on Type, and then
fixing call sites outside of core plumbing code which depended
on these methods being provided.

Reviewed By: cpuhrsch

Differential Revision: D9757935

fbshipit-source-id: 3abda0c98919a448a326a757671d438964f6909f
2018-09-11 10:40:22 -07:00
01c7542f43 Use -isystem for system includes in C++ extensions (#11459)
Summary:
I noticed warnings from within pybind11 being shown when building C++ extensions. This can be avoided by including non-user-supplied headers with `-isystem` instead of `-I`

I hope this works on Windows.

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11459

Differential Revision: D9764444

Pulled By: goldsborough

fbshipit-source-id: b288572106078f347f0342f158f9e2b63a58c235
2018-09-11 10:40:20 -07:00
d32b41003a Copy protos on install same as develop (#11517)
Summary:
This is a potential fix for https://github.com/pytorch/pytorch/issues/11453 and https://github.com/pytorch/pytorch/issues/11074 worked through with pjh5 . Turns out we had some protos copy code that was in the .sh file that was removed. Better to have it in setup.py, though, same as for develop.

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11517

Differential Revision: D9771911

Pulled By: orionr

fbshipit-source-id: 76975d8f71f38d951eaaed0b50dd3ec36dd177a9
2018-09-11 10:09:56 -07:00
deac304b6b Bugfix for basic slicing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11428

Differential Revision: D9753999

Pulled By: jamesr66a

fbshipit-source-id: cfc4163a5a06b41beb808a4e24650d71f5d91f4f
2018-09-11 09:39:29 -07:00
4e8d9a4a58 Introducing python setup.py rebuild develop (#11487)
Summary:
This speeds up incremental builds by doing the following changes:

- Uses `rsync` instead of `cp` (when `rsync` is found) which is a bit smarter in doing "maybe copy"
- Introduces a `rebuild` mode which does not rerun `cmake` in `build_pytorch_libs.sh`.
   *Note: `rebuild` should only be used if you dont add / remove files to the build, as `cmake` is not rerun*

Current no-op rebuild speedup:
- 1m 15s -> 20s

There are some lingering bugs. No-op rebuilds rerun `cmake`  for two rebuilds (likely that cmake logic is dependent on the install folder, hence kicking off rebuild).

So what you see

```
python setup.py rebuild develop    # first time - ~5 mins
python setup.py rebuild develop    # second time - ~3 mins
python setup.py rebuild develop    # third time - ~2 mins
python setup.py rebuild develop    # fourth time - ~20 seconds
python setup.py rebuild develop    # fifth time - ~20 seconds
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11487

Differential Revision: D9769087

Pulled By: soumith

fbshipit-source-id: 20fbecde33af6426149c13767e8734fb3be783c5
2018-09-11 08:56:25 -07:00
31850163ac Remove separate ATen build target (#11488)
Summary:
ATen has had a separate build target in the past, but with our move to a root-level CMakeLists.txt file this makes less sense and is harder to maintain. Also, as we blend code between Caffe2 and ATen this will become even less maintainable.

Talked to ezyang about this, but also cc zdevito, Yangqing, and soumith. If this is too difficult, I will revert, but want to see if we can simplify for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11488

Differential Revision: D9770266

Pulled By: orionr

fbshipit-source-id: c7ba52a1676d84e2d052dad4c042b666f49451cd
2018-09-11 08:56:23 -07:00
de460c7ad3 Improvements on conv/pool/fold/stft/ParamDict docs (#11106)
Summary:
Also fixes some incorrect formula rendering.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106

Differential Revision: D9752433

Pulled By: SsnL

fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21
2018-09-11 08:56:21 -07:00
86ab92b0a9 Move TensorImpl / UndefinedTensor(Impl) to core (#11441)
Summary:
Moves TensorImpl to core.
Renames UndefinedTensor to UndefinedTensorImpl and moves to core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11441

Differential Revision: D9736620

Pulled By: gchanan

fbshipit-source-id: 0322ae3b903e338de253b35a0d74a9d3e219204b
2018-09-11 07:45:56 -07:00
80fa8e1007 Add .expand() method to distribution classes (#11341)
Summary:
This adds a `.expand` method for distributions that is akin to the `torch.Tensor.expand` method for tensors. It returns a new distribution instance with batch dimensions expanded to the desired `batch_shape`. Since this calls `torch.Tensor.expand` on the distribution's parameters, it does not allocate new memory for the expanded distribution instance's parameters.

e.g.
```python
>>> d = dist.Normal(torch.zeros(100, 1), torch.ones(100, 1))
>>> d.sample().shape
  torch.Size([100, 1])
>>> d.expand([100, 10]).sample().shape
  torch.Size([100, 10])
```

We have already been using the `.expand` method in Pyro in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py#L10) of `torch.distributions`. We use this in our models to enable dynamic broadcasting. This has also been requested by a few users on the distributions slack, and we believe will be useful to the larger community.

Note that currently, there is no convenient and efficient way to expand distribution instances:
 - Many distributions use `TransformedDistribution` (or wrap over another distribution instance. e.g. `OneHotCategorical` uses a `Categorical` instance) under the hood, or have lazy parameters. This makes it difficult to collect all the relevant parameters, broadcast them and construct new instances.
 - In the few cases where this is even possible, the resulting implementation would be inefficient since we will go through a lot of broadcasting and args validation logic in `__init__.py` that can be avoided.

The `.expand` method allows for a safe and efficient way to expand distribution instances. Additionally, this bypasses `__init__.py` (using `__new__` and populating relevant attributes) since we do not need to do any broadcasting or args validation (which was already done when the instance was first created). This can result in significant savings as compared to constructing new instances via `__init__` (that said, the `sample` and `log_prob` methods will probably be the rate determining steps in many applications).

e.g.
```python
>>> a = dist.Bernoulli(torch.ones([10000, 1]), validate_args=True)

>>> %timeit a.expand([10000, 100])
15.2 µs ± 224 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit dist.Bernoulli(torch.ones([10000, 100]), validate_args=True)
11.8 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

cc. fritzo, apaszke, vishwakftw, alicanb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11341

Differential Revision: D9728485

Pulled By: soumith

fbshipit-source-id: 3b94c23bc6a43ee704389e6287aa83d1e278d52f
2018-09-11 06:56:18 -07:00
120d769432 Add support for tracing strings (#11506)
Summary:
This enabled `torch.einsum` both in tracing and in script mode. It's used all over Pyro at the moment, and is needed for any use of the JIT in there.

Fixes #11157.

zdevito fritzo neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11506

Differential Revision: D9764787

Pulled By: apaszke

fbshipit-source-id: 9b5251b9e7c5897034602bd07ff67b425d33326c
2018-09-11 06:02:41 -07:00
0ddbe668cd Improve shape analysis to cover all most commonly used ops (#11358)
Summary:
[Here's a list](https://gist.github.com/apaszke/f0821840bdcc67a977832dc58acc1b85) of ops that are in `register_aten_ops.cpp`, but aren't supported in shape prop. Everything else should work now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11358

Differential Revision: D9753693

Pulled By: apaszke

fbshipit-source-id: efeae0126ce16cb56b8797fc5246405588bcae3c
2018-09-11 06:02:39 -07:00
f84693efa9 nomnigraph - Improvements to subgraph matching APIs (#11418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11418

Several improvements that aim to make the APIs more straightforward to use

- Get rid of helper methods subgraph and nonTerminal . Users now should create a NNMatchGraph directly via graph's createNode and createEdge API

- Get rid of operatorSubgraph helper method

- invertGraphTraversal flag applies to both the match graph and the scanned graph. This allows user to create match graph in the same direction as the scanned graph, thus reduce confusion.

- additional parameters of matchNode (count, includeInSubgraph, nonTerminal) are removed from the constructors and moved into setter methods. (We no longer enforce that MatchNode is immutable but this helps improve code clarity).

- Tests are updated to reflect the changes

Follow up changes:
- Possibly clean up the tests further. This change aims to minimally modify the unit tests.
- Help a validity check that enforce the current limitation of the match graph (single source node), and throws if the match graph does not satisfy the criteria.
- Have the single source node be detected automatically and callers just need to pass in the matchGraph instead of the source node reference.

Differential Revision: D9732565

fbshipit-source-id: ae8320e2bc89b867f6bb4b1c1aad635f4b219fa1
2018-09-11 04:39:27 -07:00
3d5fd12488 Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450)
Summary:
This is the new documentation for c10d release, and it also deprecates the old torch.distributed document.

This PR depends on https://github.com/pytorch/pytorch/pull/11405

and should only be landed after https://github.com/pytorch/pytorch/pull/11405 is landed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11450

Differential Revision: D9765504

Pulled By: teng-li

fbshipit-source-id: 48f38b27b8c270baf389f8e478ea226b9ecc63db
2018-09-11 02:10:28 -07:00
0988bbad2d C10d release to torch.distributed for PT1 (#11405)
Summary:
The old `torch.distributed` will go to `torch.distributed.deprecated`
The old DDP will go to `torch.nn.parallel.deprecated`

Now `torch.nn.parallel.DDP` will use c10d DDP
Now `torch.distributed` will use C10d frontend API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405

Reviewed By: pietern

Differential Revision: D9733733

Pulled By: teng-li

fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08
2018-09-10 23:27:22 -07:00
b14a80553d Ignore functional doc error
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11508

Differential Revision: D9764380

Pulled By: goldsborough

fbshipit-source-id: 3abb9c04f46137be833ea26d67734741e14f8010
2018-09-10 20:55:48 -07:00
f9d12eeb27 Give copy an optional device argument.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11497

Differential Revision: D9762014

Pulled By: gchanan

fbshipit-source-id: 996419cc5e86d000af953d030ff361adafb921ad
2018-09-10 20:40:03 -07:00
dd8defeb3f Document the Functional module (#11460)
Summary:
Document the `Functional` module in the C++  API.

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11460

Differential Revision: D9757555

Pulled By: goldsborough

fbshipit-source-id: 15f8bf6d60bd26f3f4e69fb8e414e186e3c220ee
2018-09-10 19:58:38 -07:00
9cfdf0d677 Document the Embedding module (#11469)
Summary:
ebetica soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11469

Differential Revision: D9757547

Pulled By: goldsborough

fbshipit-source-id: a95673abe949bb81d716dbc03c5c3e2a11cc15d3
2018-09-10 18:25:08 -07:00
a175282776 Flags for LMDB, LevelDB, and Caffe2 ops (#11462)
Summary:
Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with

```
USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps
```

Also add a flag to build Caffe2 ops, which is default `ON`. Disable with

```
NO_CAFFE2_OPS=1 python setup.py build_deps
```

cc Yangqing soumith pjh5 mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462

Reviewed By: soumith

Differential Revision: D9758156

Pulled By: orionr

fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63
2018-09-10 17:27:50 -07:00
e1e69446f6 Lockdown NO_TEST=1 for tests even more (#11415)
Summary:
Skip torch tests as well when NO_TEST=1 environment variable is set. Also remove the separate ATen code path for not being built with Caffe2, since it will always be built with Caffe2.

cc The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11415

Reviewed By: soumith

Differential Revision: D9758179

Pulled By: orionr

fbshipit-source-id: e3e3327364fccdc57a703aeaad8c4f30452973fb
2018-09-10 17:27:48 -07:00
3e49a69466 Resolve ambiguity when including both caffe2 and aten registries (#11411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11411

Simple fix

Reviewed By: goldsborough

Differential Revision: D9730371

fbshipit-source-id: f841327c01faa13cfb6b7fc6e279b8fc50fad1db
2018-09-10 17:27:46 -07:00
3ad67c60f0 Traceable explicit Variable instantiation (#11463)
Summary:
There's a bunch of legacy code where people are explicitly instantiating Variable, and these call-sites have thus far been untraceable (appearing as prim::Constant nodes with the tensor value at the time of tracing). This makes it so that the new variable inherits the traced Value* from the tensor it's being constructed from
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11463

Differential Revision: D9756529

Pulled By: jamesr66a

fbshipit-source-id: da99c6a7621957a305f2699ec9cb9def69b1b2d7
2018-09-10 17:03:24 -07:00
f2f43ad2da Add new LengthsSplit operator (#10974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291

This new operator will do the following:

Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where:

1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements)
2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1)
3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0)

Reviewed By: bddppq, chocjy

Differential Revision: D9013119

fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84
2018-09-10 15:40:28 -07:00
0b78ae86c5 Cleanup byte swapping utilities to generate optimal code on the platforms we care about. (#11394)
Summary:
While the use of memcpy as part of the byte swapping sequence looks funky, all major
compilers recognize and optimize this pattern reliably, resulting in essentially
optimal code generation.

For example, decodeUInt32LE goes from this on iOS arm64:
>         ldrb    w8, [x0, #3]
>         ldrb    w9, [x0, #2]
>         bfi     w8, w9, #8, #8
>         ldrb    w9, [x0, #1]
>         bfi     w8, w9, #16, #8
>         ldrb            w9, [x0]
>         bfi     w8, w9, #24, #8
>         mov      x0, x8
>         ret

To this:
>         ldr             w8, [x0]
>         rev     w0, w8
>         ret
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11394

Reviewed By: SsnL

Differential Revision: D9728659

Pulled By: resistor

fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f
2018-09-10 15:40:24 -07:00
a0d4106c07 Integrate custom op tests with CI (#10611)
Summary:
This PR is stacked on https://github.com/pytorch/pytorch/pull/10610, and only adds changes in one file `.jenkins/pytorch/test.sh`, where we now build the custom op tests and run them.

I'd also like to take this PR to discuss whether the [`TorchConfig.cmake`](https://github.com/pytorch/pytorch/blob/master/cmake/TorchConfig.cmake.in) I made is robust enough (we will also see in the CI) orionr Yangqing dzhulgakov what do you think?

Also ezyang for CI changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10611

Differential Revision: D9597627

Pulled By: goldsborough

fbshipit-source-id: f5af8164c076894f448cef7e5b356a6b3159f8b3
2018-09-10 15:40:21 -07:00
3e665cc29b Improve support for tracing sizes, add more tracer warnings (#11288)
Summary:
Many constructors like `torch.zeros` or `torch.randn` didn't support
size tracing correctly which is fixed by this pass. Same issue has been
fixed in legacy tensor constructors.

Additionally, new tensor constructors, which do not participate in
tracing (most notably `torch.tensor`, `torch.as_tensor` and
`torch.from_numpy`) raise a warning when they are used.

Finally, entering a traceable operation disables the tracing in its body.
This is needed because

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11288

Reviewed By: ezyang

Differential Revision: D9751183

Pulled By: apaszke

fbshipit-source-id: 51444a39d76a3e164adc396c432fd5ee3c8d5f7f
2018-09-10 15:22:48 -07:00
70d93f4777 Check for maximum numel in NCCL broadcasting (#11466)
Summary:
NCCL1 uses `int` as its numerical type for fields like `count`, which makes broadcasting tensors larger than `2 << 31 - 1` impossible, and raises opaque error `invalid arguments`. NCCL2 greatly increase the limit on many platforms by using `size_t`. This patch statically detects this type, and raises properly if the broadcast tensor exceeds the limit.

No test because I don't think our test suite should broadcast big tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11466

Differential Revision: D9754753

Pulled By: SsnL

fbshipit-source-id: 73506450cae047e06b5b225b39efdb42d5d26685
2018-09-10 14:39:15 -07:00
35008e0a1a Add flags to fix half comparison and test (#11395)
Summary:
The controller you requested could not be found.  found there are some issues when using comparison operators for half types when certain THC header are included. I was able to reproduce and added a test. I also fix the issue by adding the proper definitions to avoid this issue.

Reported in https://github.com/pytorch/pytorch/pull/10301#issuecomment-416773333
Related: https://github.com/pytorch/tutorials/pull/292

soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11395

Differential Revision: D9725102

Pulled By: goldsborough

fbshipit-source-id: 630425829046bbebea3409bb792a9d62c91f41ad
2018-09-10 14:10:21 -07:00
18e5fd36c2 Normalize gradients before reduction in DistributedDataParallelC10d (#11109)
Summary:
Normalizing by the world size before the reduction is less likely to cause overflow in FP16 training.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11109

Differential Revision: D9594708

Pulled By: myleott

fbshipit-source-id: 93ab53cb782ee1cbe1264e529b333490a0940338
2018-09-10 13:55:09 -07:00
ea0ee77c61 Fix katex math rendering (#11472)
Summary:
I'm 80% sure that this fixes the math bug. But I can't repro locally so I don't know.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11472

Differential Revision: D9755328

Pulled By: SsnL

fbshipit-source-id: 130be664d3c6ceee3c0c166c1a86fc9ec3b79d74
2018-09-10 12:40:23 -07:00
198ade74f9 Remove manual refcounting from Tensor class (#11294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11294

The Tensor(ptr, retain) constructor is error prone and circumvents the intrusive_ptr safety.

This diff removes that and pushes the responsibility to callers.
Step by step, manual refcounting can be pushed back and possibly eliminated in the end.

Reviewed By: ezyang

Differential Revision: D9663476

fbshipit-source-id: 7f010e5e47b137a9575960201c5bf5d552c5c2f5
2018-09-10 12:40:21 -07:00
b0c1397271 Fix intrusive_ptr move/copy for different NullType's (#11260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11260

This is needed to make something like this work:

    intrusive_ptr<TensorImpl, UndefinedTensorImpl> a = make_intrusive<SparseTensorImpl>(...);

Reviewed By: ezyang

Differential Revision: D9652089

fbshipit-source-id: 19c65e98460ccb27bc69e36d7e558cb9d6e67615
2018-09-10 12:40:20 -07:00
252f93df09 Improve Tensor() constructor (#11258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11258

The two intrusive_ptr constructors in Tensor can be combined into one implementation that does both, moving and copying.

Reviewed By: ezyang

Differential Revision: D9652088

fbshipit-source-id: 5efca02654ba305c99c20bbeb83551469d17a51d
2018-09-10 12:40:19 -07:00
09292f2c03 Some improvements to IValue (#11238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11238

- when moving an IValue, free the old value instead of keeping it allocated
- making classes final
- moving std::string
- making ConstantList const

Reviewed By: ezyang

Differential Revision: D9644700

fbshipit-source-id: ab7228368e4f00f664ba54e1242b0307d91c5e7e
2018-09-10 12:40:17 -07:00
ce6906b051 Narrowing Blob (#11167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11167

Narrow the Blob API as preparation for merging Blob/IValue

- get rid of templated IsType and Operator::InputIsType / OutputIsType
- Use 'using' instead of 'typedef' for DestroyCall (just for readability)

Reviewed By: ezyang

Differential Revision: D9623916

fbshipit-source-id: 952f0b0cf5a525094b02e8d2798dd57a56a9e1d8
2018-09-10 12:40:16 -07:00
040d75d455 Add option to use CUDA memory leak testing as a context manager (#11380)
Summary:
cc SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11380

Reviewed By: ezyang

Differential Revision: D9705877

Pulled By: zou3519

fbshipit-source-id: 02470c25236f57fa02f4ac9d7ed63d38a6355db2
2018-09-10 12:40:15 -07:00
2158f4a9c8 add export import test to TestJitGenerated (#10982)
Summary:
Checking assertExportImport for all of the generated test jit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10982

Differential Revision: D9636935

Pulled By: eellison

fbshipit-source-id: f3f1ce77d454848098f2ac7e0fa18bf8564890be
2018-09-10 11:37:05 -07:00
cee743f639 Move backward/set_data to Type-based dispatch.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11440

Differential Revision: D9736565

Pulled By: gchanan

fbshipit-source-id: 1e66f54f1c87084f37c0b014030f0d6d2f8dfaee
2018-09-10 08:40:29 -07:00
87a9a8f80a Use AT_CHECK and AT_ERROR
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11444

Differential Revision: D9736992

Pulled By: SsnL

fbshipit-source-id: bf5320e878c6ef71468f3e2aa12ce304b92d45ca
2018-09-09 21:26:12 -07:00
560d6efd3a Only join started dataloader workers (#11432)
Summary:
`Process.start()` actually take some time as it needs to start a
process and pass the arguments over via a pipe. Therefore, we
only add a worker to self.workers list after it started, so
that we do not call `.join()` if program dies before it starts,
and `__del__` tries to join it but will get:
    AssertionError: can only join a started process.

Example trace when such error happens:
```py
[unrelated]
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__
    return _DataLoaderIter(self)
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__
    w.start()
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
    self.pid = os.fork()
KeyboardInterrupt
Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60>
Traceback (most recent call last):
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__
    self._shutdown_workers()
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers
    w.join()
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join
    assert self._popen is not None, 'can only join a started process'
AssertionError: can only join a started process
```

No test because hard to reliably trigger.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432

Reviewed By: ezyang

Differential Revision: D9735430

Pulled By: SsnL

fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351
2018-09-09 12:55:51 -07:00
87b2f05a9c Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435)
Summary:
Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435

Differential Revision: D9736396

Pulled By: soumith

fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f
2018-09-09 11:40:25 -07:00
581099a7b2 pybind conversion for IntList (#11425)
Summary:
as discussed with ezyang and slayton58 , this might be a nice convenience to be able to use code in extensions just as in ATen.

also split off `tracing_state.h` from `torch/jit/tracer.h` fix #11204 to bee able to use the utility functions

pytorchbot  it's not a jit patch per se.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11425

Differential Revision: D9735556

Pulled By: ezyang

fbshipit-source-id: 466c92bbdb1d7d7a970eba1c26b7583fe9756139
2018-09-09 10:39:40 -07:00
ee4309a9ac override BUILD_TEST when building gloo (#11431)
Summary:
A recent build regression is that we need a system GoogleTest for builds to pass.

This was because, when building with Gloo, gloo is trying to build it's own tests, which look for system gtest [here](https://github.com/facebookincubator/gloo/blob/master/cmake/Dependencies.cmake#L72-L80) (because we're not using full cmake build and making it aware of third_party/GoogleTest, but instead, we are building it isolated using tools/build_pytorch_libs.sh

Traditionally, we didn't ask Gloo to build it's tests, but because we added `-DBUILD_TEST=1` by default to all builds (in refactoring variable names), we accidentally started asking Gloo to build it's tests.

This PR overrides the Gloo flags and asks it to not build tests (like it used to)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11431

Differential Revision: D9736387

Pulled By: soumith

fbshipit-source-id: 59e84edae780123b793bdaea5fd9ac46156cd0af
2018-09-09 10:11:56 -07:00
1b94f5c6e6 optimize masked_fill on CPU (#11359)
Summary:
This PR parallels `masked_fill` on CPU, currently it runs in sequential on CPU.

the following script is used to benchmark and verify this PR. On Xeon skylake 8180 (2 sockets * 28 cores),
 it runs `4.20` sec without the PR and `0.11` sec with the PR.

```python
import torch
import random
from time import time

size = 10 * 1000 * 1000
count = 100

def test_masked_fill():
    dst = torch.randn(size)
    dst_ = dst.clone()
    mask = torch.rand(size).mul(2).floor().byte()
    val = random.random()

    tstart = time()
    for i in range(count):
        dst.masked_fill_(mask, val)
    tend = time()
    print("masked_fill_: %f" % (tend-tstart))

    for i in range(size):
        if mask[i]:
            if dst[i] != val:
                print("fail")
        else:
            if dst[i] != dst_[i]:
                print("fail1")
    print("test_masked_fill: PASS")

test_masked_fill()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11359

Differential Revision: D9735578

Pulled By: ezyang

fbshipit-source-id: d437ad7c6dace1910d0c18d6d9ede80efb44fae4
2018-09-09 00:25:26 -07:00
b7ecf035dc Updates FindCUDA.cmake to 3.12.2 upstream version (#11406)
Summary:
This PR is just a copy-paste of the upstream FindCUDA.cmake. Since, cublas_device is deprecated in CUDA >= 9.2, this change is necessary for build.

Related: https://gitlab.kitware.com/cmake/cmake/merge_requests/2298
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11406

Differential Revision: D9735563

Pulled By: ezyang

fbshipit-source-id: c74d86ced7cc485cb2233f9066ce23e921832c30
2018-09-08 23:10:32 -07:00
6683fb56ca Add AVX optimizations for pdist (#11230)
Summary:
Added AVX optimizations for pdist using Vec256. This brings single threaded performance up to speed with scipy, but the current implementation greatly hurts performance without AVX enabled. Is there a way to special case out AVX on dispatch and call the non Vec256 code? Or is the way I used Vec256 completely wrong?

Single threaded comparison to scipy
============================

This is the time to compute the pdist of a 2048 x 2048 float matrix with only one thread for various values of p between torch and scipy. p = 3 is the code path for arbitrary p, and so is much slower than the other values.

p | torch | scipy
-----|-----------|------
0 | 6.27 s ± 393 ms | 7.23 s ± 498 ms
1 | 5.49 s ± 201 ms | 43.4 s ± 1.09 s
2 | 5.74 s ± 474 ms | 53.8 s ± 3.52 s
∞ | 5.59 s ± 292 ms | 47.4 s ± 2.03 s
3 | really slow | gave up

Result by AVX support
================

This is the time to compute the distance and gradient of a 2048 x 2048 float matrix with all threads by AVX support. `before` is the old code, `default` is no AVX support, etc. Interestingly the AVX optimizations provided a great benefit over the old unoptimized code, but drastically hurt performance when compiled without AVX optimizations. p = 3 is the code path for arbitrary p, and so is much slower than the other values.

Results for p = 0
----------------

avx | dist | grad
----|------|-----
before | 514 ms ± 87.5 ms | 191 µs ± 35 µs
default | 3.47 s ± 183 ms | 201 µs ± 24.6 µs
avx | 123 ms ± 18.2 ms | 281 µs ± 130 µs
avx2 | 103 ms ± 11.4 ms | 216 µs ± 74.4 µs

Results for p = 1
----------------

avx | dist | grad
----|------|-----
before | 426 ms ± 35 ms | 6.21 s ± 187 ms
default | 2.6 s ± 123 ms | 5.62 s ± 273 ms
avx | 104 ms ± 6.37 ms | 833 ms ± 44.3 ms
avx2 | 106 ms ± 3.59 ms | 924 ms ± 86.2 ms

Results for p = 2
-----------------

avx | dist | grad
----|------|-----
before | 425 ms ± 45.4 ms | 6.31 s ± 125 ms
default | 3.04 s ± 187 ms | 3.55 s ± 242 ms
avx | 110 ms ± 3.66 ms | 896 ms ± 21.8 ms
avx2 | 113 ms ± 4.68 ms | 934 ms ± 25.2 ms

Results for p = ∞
------------------

avx | dist | grad
----|------|-----
before | 501 ms ± 39.5 ms | 6.64 s ± 321 ms
default | 2.15 s ± 92.9 ms | 8.43 s ± 355 ms
avx | 104 ms ± 5.52 ms | 835 ms ± 36.7 ms
avx2 | 100 ms ± 3.41 ms | 864 ms ± 67 ms

Results for p = 3
-----------------

avx | dist | grad
----|------|-----
before | 22.6 s ± 413 ms | 11.1 s ± 242 ms
default | 24.9 s ± 1 s | 11.2 s ± 293 ms
avx | 2.69 s ± 148 ms | 5.63 s ± 88.4 ms
avx2 | 2.48 s ± 31.8 ms | 5.61 s ± 114 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11230

Differential Revision: D9735503

Pulled By: erikbrinkman

fbshipit-source-id: a9da619249e4ca2625b39ca1ca7f5543c3086bfb
2018-09-08 22:55:02 -07:00
538ea67437 Search for CMake config files for pybind11. (#11423)
Summary:
If pybind is build with cmake and installed, we should use config file instead of the Findpybind11 shipped with caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11423

Differential Revision: D9735557

Pulled By: ezyang

fbshipit-source-id: 28a39e579fa045060aa1a716e5fd7dbcf7b89569
2018-09-08 22:44:03 -07:00
02114e877f fix #10838 incorrect bidirectional output format (#11368)
Summary:
Fixes the issue discussed in #10838. `hidden_size` should be the last dimension regardless if we're in ONNX or PyTorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11368

Differential Revision: D9734814

Pulled By: soumith

fbshipit-source-id: 7f69947a029964e092c7b88d1d79b188a417bf5f
2018-09-08 17:09:57 -07:00
ac9268f25d Conversions to and from complex numbers. (#11420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11420

Surprisingly tricky!  Here are the major pieces:

- We grow a even yet more ludicrous macro
  AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF
  which does what it says on the tin.  This is because I was
  too lazy to figure out how to define the necessary conversions
  in and out of ComplexHalf without triggering ambiguity problems.
  It doesn't seem to be as simple as just Half.  Leave it for
  when someone actually wants this.

- Scalar now can hold std::complex<double>.  Internally, it is
  stored as double[2] because nvcc chokes on a non-POD type
  inside a union.

- overflow() checking is generalized to work with complex.
  When converting *to* std::complex<T>, all we need to do is check
  for overflow against T.  When converting *from* complex, we
  must check (1) if To is not complex, that imag() == 0
  and (2) for overflow componentwise.

- convert() is generalized to work with complex<->real conversions.
  Complex to real drops the imaginary component; we rely on
  overflow checking to tell if this actually loses fidelity. To get
  the specializations and overloads to work out, we introduce
  a new Converter class that actually is specializable.

- Complex scalars convert into Python complex numbers

- This probably fixes complex tensor printing, but there is no way
  to test this right now.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: cpuhrsch

Differential Revision: D9697878

Pulled By: ezyang

fbshipit-source-id: 181519e56bbab67ed1e5b49c691b873e124d7946
2018-09-08 16:39:43 -07:00
d3f98b5ffc Add matrix power (#11421)
Summary:
vishwakftw Your patch needed some updates because the default native function dispatches changed from `[function, method]` to `[function]`. The CI was run before that change happened so it still shows green, but the internal test caught it.

I did some changes when rebasing and updating so I didn't just force push to your branch. Let's see if this passes CI and internal test. If it does, let me know if you want me to force push to your branch or use this PR instead.

Note to reviewers: patch was already approved at #10068 .

cc yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11421

Differential Revision: D9733407

Pulled By: SsnL

fbshipit-source-id: cf2ed293bb9942dcc5158934ff4def2f63252599
2018-09-08 15:25:56 -07:00
802380ac93 Improve LegacyTypeDispatch to handle initialization correctly. (#11331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11331

In the previous commit, we added a bare-bones LegacyTypeDispatch in ATen/core.
This is not sufficient for the use cases we need: we not only need to be able to
get a Type, but we also need to be able to *initialize* the Types if its the first time
we have retrieved a CPU/CUDA/Complex type. I hemmed and hawed about how
to do this; the strategy this PR takes is to introduce a new "hooks" interface
specifically for initializing CPU/CUDA/Complex (which still lives in Context). We then
move all "user-friendly" functions to LegacyTypeDispatch.

Here were some other options which I considered, but don't work:
- Assume that Type is already initialized, because we only intend to call Type
  from Tensor methods, where we already have a Tensor. This does not work
  because Caffe2 created tensors will not have gone through the standard
  Type codepath, and will have skipped initialization.
- Move CUDAHooks and ComplexHooks to ATen/core. Besides being sucky,
  this isn't even a complete fix, because I still need to initialize CPU hooks
  (so you *still* need another hooks interface).

Reviewed By: cpuhrsch

Differential Revision: D9666612

fbshipit-source-id: ac7004b230044b67d13caa81fdfaf3c6ab915e3f
2018-09-08 10:10:17 -07:00
9687a72794 Move the type registry out of Context, into LegacyTypeDispatch. (#11274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11274

We don't want to put all of Context into ATen/core, but one
particular part cannot be avoided: the type registry, because
implementations of TensorMethods will need to get a Type,
and then do a virtual call on it.

I needed to do a little bit of (temporary) footwork to get this
in without also moving Type, because unique_ptr<Type> expects
to be able to see the destructor of Type (but it's forward declared
right now).  So instead I put the destructor as an explicit functor.  We
can get rid of this once Type actually moves in ATen/core

Reviewed By: cpuhrsch

Differential Revision: D9657449

fbshipit-source-id: 940931493bf4f1f6a8dad03f34633cacdd63dd0b
2018-09-08 10:10:11 -07:00
b9b9ae935b Make torch.randint have default dtype int64 (#11040)
Summary:
cc gchanan apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11040

Differential Revision: D9565728

Pulled By: SsnL

fbshipit-source-id: eb5be9609f30c88f52746fa7e13ad71e2856648e
2018-09-08 07:55:06 -07:00
505ecab88d bumping up the default store timeout (#11409)
Summary:
to 300 seconds to be safe. It used to be no timeout in THD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11409

Differential Revision: D9731709

Pulled By: teng-li

fbshipit-source-id: 0ce011dcca507cbf063176ad4995405c77dd0cdd
2018-09-07 23:55:23 -07:00
3d2862526b Support send/recv for the gloo process group (#11387)
Summary:
This change removes the skips for the existing send/recv tests in the backwards compatibility layer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11387

Reviewed By: teng-li

Differential Revision: D9729330

Pulled By: pietern

fbshipit-source-id: f8899219a94d806386d03e9ef53bff622d8658a3
2018-09-07 20:25:18 -07:00
47c1de25e8 Test exporting batch norm, dropout, RNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11126

Differential Revision: D9727689

Pulled By: jamesr66a

fbshipit-source-id: f142257a2fba27d86844bf33084174f1f68a8ca5
2018-09-07 19:41:39 -07:00
b7a2c91eed remove unnecessary clone() when .grad is None (#11165)
Summary:
Currently gradient is copied into .grad if it is None. This PR aim to remove the copy when it is not absolutely needed.

It is generally an improvement of speed and memory usage. And here is a case it may help a lot:
Normally, people do optimizer.zero_grad() every minibatch before backward. It will translate into a memset, and later a point-wise add.
When there is some large weight in the network, one optimization people can always do is set parameter.grad to None instead of zero_grad. This will remove memset and change point-wise add to a memcpy.
Here is result running following script on V100 GPU. It is 100 iterations of forward/backward/zero_grad on single 1-billion word benchmark size embedding.
`Zero grad: 2.123847723007202`
`None grad: 1.3342866897583008`

With the backend change of this PR, the unnecessary memcpy is removed, thus further speed up is achieved.
`Zero grad: 2.124978542327881`
`None grad: 0.4396955966949463`

[benchmark.txt](https://github.com/pytorch/pytorch/files/2341800/benchmark.txt)

Some details on the code change:
.detach() is used because we need to get rid of new_grad being a view without copy data. This should be safe in first-order only mode.
data need to be contiguous, otherwise `grad_variable.data() += new_grad.data();` below will fail.
Only the last variable that has reference to the temp gradient will grab its buffer.

ngimel, mcarilli  and mruberry helped on finalizing this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11165

Differential Revision: D9728874

Pulled By: soumith

fbshipit-source-id: b8fb822a2dff6e812bbddd215d8e384534b2fd78
2018-09-07 19:41:37 -07:00
c49b01a8a0 Change default variants to 'function'. (#11247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11247

Previously, the default for a declaration in native_functions.yaml
was ['function', 'method'], i.e., generate both a method and
function for every binding.  We now believe this is inappropriate:
the majority of new kernels added to PyTorch should live as
free functions, NOT methods.  Thus, we change the default accordingly.

I also took the opportunity to de-method some "internal" functions
that had a leading underscore.  While, strictly speaking, this is a
BC breaking change, I believe it is highly unlikely anyone was using
these directly.

Reviewed By: yf225

Differential Revision: D9648570

fbshipit-source-id: 8b94647b824e0899d6d18aa5585aaedc9d9957d2
2018-09-07 17:56:08 -07:00
fa522d1aed Revert D9720931: [pytorch][PR] [third-party] Update googletest to release-1.8.1
Differential Revision:
D9720931

Original commit changeset: 18a60d0409e7

fbshipit-source-id: a05dcba71277eb4f8ac38886f307d6cf6e6955a9
2018-09-07 17:42:03 -07:00
c9843bd86b Update googletest to release-1.8.1 (#11388)
Summary:
This is mainly to pick up the change 20074be19a to avoid polluting the CMAKE_DEBUG_POSTFIX variable. cc orionr .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11388

Reviewed By: orionr

Differential Revision: D9720931

Pulled By: Yangqing

fbshipit-source-id: 18a60d0409e74316f74d364f4fe16bf0d0198413
2018-09-07 16:56:16 -07:00
31d36b1d31 move complex registration test out-of-line (#11397)
Summary:
Moves the code for the complex registration code into an out-of-line C++ extension to de-noise the test_cpp_extensions.py file. Let's keep it nice and tidy so we can point our users at it for usage examples.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11397

Differential Revision: D9725335

Pulled By: goldsborough

fbshipit-source-id: 290618f2ee711b1895cdb8f05276034dfe315c6d
2018-09-07 16:56:14 -07:00
4ae16c9ad9 Recursive descent for validation + convert expands in ATen fal… (#11356)
Summary:
…lback
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11356

Differential Revision: D9721002

Pulled By: jamesr66a

fbshipit-source-id: eeb50b56f8a72e929860c5e459a5ab50ac624814
2018-09-07 16:39:36 -07:00
4c8cc36e34 Fix igios build (#11392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11392

Fix igios build

Reviewed By: houseroad

Differential Revision: D9720833

fbshipit-source-id: 33acc3c658c22addd4bad142433824076233e901
2018-09-07 15:55:23 -07:00
4bf5fc44c8 Fix split_size test failures (#11051)
Summary:
~~This PR fixes #8525 by renaming `split_with_sizes` to `split` so that 2 `aten::split` ops are
generated (previously `aten::split(self, int, int)` and `aten::split_with_sizes(self, int[], int)` were generated)~~

~~`split_with_sizes` was made in PR #5443, but I don't see a reason for it to have
a different name than `split` rather than just overload `split`.~~

This PR fixes #8525 by adding `register_special_ops.cpp` to mirror Python dispatching from `split` to `split` and `split_with_sizes` in [tensor.py](https://github.com/pytorch/pytorch/blob/master/torch/tensor.py#L279).

It also fixes #8520 by adding an `int[]` wherever it sees `torch.Size`

In a follow up PR this could also be used to fix some of the other `unknown builtin op` test errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11051

Differential Revision: D9582443

Pulled By: driazati

fbshipit-source-id: d27201f85937d72e45e851eaa1460dd3dd1b61a9
2018-09-07 15:39:24 -07:00
9886ebeb24 Remove hardcoded system path from CMAKE_MODULE_PATH (#11386)
Summary:
This seems to be causing different versions of OpenMPI being picked up
by different parts of the build. Not a good practice to include absolute
paths anyway, so let's try removing it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11386

Reviewed By: teng-li

Differential Revision: D9724349

Pulled By: pietern

fbshipit-source-id: 3dfef91c81f2e97e5125284aff9e7e98f8761917
2018-09-07 15:25:38 -07:00
802d21c8f4 Remove FULL_CAFFE2 flag (#11321)
Summary:
Continuing pjh5's work to remove FULL_CAFFE2 flag completely.

With these changes you'll be able to also do something like

```
NO_TEST=1 python setup.py build_deps
```
and this will skip building tests in caffe2, aten, and c10d. By default the tests are built.

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321

Reviewed By: mingzhe09088

Differential Revision: D9694950

Pulled By: orionr

fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8
2018-09-07 15:09:44 -07:00
93da5a21c9 Update variable view note
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11393

Differential Revision: D9725444

Pulled By: SsnL

fbshipit-source-id: b1607d986ab93e64b0b0ff9e8f10d9e3f6e2160e
2018-09-07 15:09:43 -07:00
77b6d7d255 Doc improvements (#11347)
Summary:
1. Remove cudnn* symbols from C++ docs
2. Fix code examples for `nn::Module` and `jit::compile`
3. Document Dropout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11347

Differential Revision: D9716751

Pulled By: goldsborough

fbshipit-source-id: e0566cec35848335cac3eb9196cb244bb0c8fa45
2018-09-07 14:39:36 -07:00
7de0332e10 Add initial documentation for JIT (#11357)
Summary:
In addition to documentation, this cleans up a few error message formats.
It also adds infra to find which operators are supported by the JIT automatically, which is then used in the generation of the docs.

The wording and formatting of the docs is not yet polished, but having this will allow our document writers to make faster progress.

Followup PRs will polish the docs and fix formatting issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11357

Differential Revision: D9721277

Pulled By: zdevito

fbshipit-source-id: 153a0d5be1efb314511bcfc0cec48643d78ea48b
2018-09-07 14:27:47 -07:00
69b4b45f91 enable missing nn tests with single grad check, minor refactor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11366

Differential Revision: D9723305

Pulled By: wanchaol

fbshipit-source-id: 9e7e2e7e68cb4919610bccfbf76fa33b647f6eb7
2018-09-07 14:27:46 -07:00
576807ce1a flaky test fix trial (#11391)
Summary:
Add a barrier() to wait for all PG created before destroy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11391

Differential Revision: D9727383

Pulled By: teng-li

fbshipit-source-id: 689d62c978e642b68f4949dcf29982e34869ada4
2018-09-07 14:10:06 -07:00
e9da2dd3cc Do not use PERSISTENT cudnn mode for spatialBN (#11382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11382

We found this cudnn bug in S163230 that causes accuracy loss. We fix this in D9601217, but due to the reimplementation of spatialBN it's overwritten. Let's land this fix again.

Reviewed By: kuttas

Differential Revision: D9702347

fbshipit-source-id: 11547e9edaf7b2ba7f4aa7263ffb4f0281bbf078
2018-09-07 13:41:18 -07:00
01930a3145 Move sync_params to C++ (#9805)
Summary:
The next function I'm moving to C++ is `sync_params`. It is stacked on top of https://github.com/pytorch/pytorch/pull/9729, so some changes will go away when it lands and I rebase.

I also split code into a `.h` and `.cpp` file for better code organization.

The controller you requested could not be found. pietern apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9805

Differential Revision: D9688604

Pulled By: goldsborough

fbshipit-source-id: 4467104d3f9e2354425503b9e4edbd59603e20a8
2018-09-07 12:56:40 -07:00
ba6f10343b update CUDAExtension doc (#11370)
Summary:
fix typo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11370

Differential Revision: D9701777

Pulled By: soumith

fbshipit-source-id: 9f3986cf30ae0491e79ca4933c675a99d6078982
2018-09-07 12:56:38 -07:00
733402bef4 Fix issues with certain heterogeneous types in lists during tensor creation (#11377)
Summary:
Closes #9963
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11377

Differential Revision: D9701824

Pulled By: soumith

fbshipit-source-id: 89c5448fd90ece1b365dc42f775b6b0c73ce790c
2018-09-07 12:56:35 -07:00
5e400e9cae move context_base.h to ATen/core (#11336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11336

Move `context_base.h` header to `ATen/core` and the implementations are in `caffe2/core/context_base.cc`

Reviewed By: ezyang

Differential Revision: D9670493

fbshipit-source-id: ce5bf2b3b4c80e9b62819f4332ce68af82720055
2018-09-07 12:20:25 -07:00
fb4e8088f3 Remove methods that start with an underscore from at::Tensor (#11152)
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.

For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.

ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152

Differential Revision: D9683607

Pulled By: goldsborough

fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
2018-09-07 11:55:11 -07:00
e80f7e1f64 Fix more warnings (#11320)
Summary:
also a missing space in fft error message
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11320

Differential Revision: D9676012

Pulled By: SsnL

fbshipit-source-id: a636e5fce042198510c8e456fa51fde714da8348
2018-09-07 11:26:58 -07:00
91089a7e17 Add GPU implementation of pdist (#11102)
Summary:
Add the gpu kernel version.

The parallelism I went with performs poorly when there are a large number of vectors, but they're all short, as I don't allocate the thread pool to wrap in that case.

Test Plan
---------
```
python -m unittest test_torch.TestTorch.test_pdist_{empty,scipy} test_nn.TestNN.test_pdist{,_zeros,_empty_row,_empty_col,_cpu_gradgrad_unimplemented,_cuda_gradgrad_unimplemented} test_jit.TestJitGenerated.test_nn_pdist
```

Current performance specs are a little underwhelming, I'm in the process of debugging.

size | torch | torch cuda | scipy
-----|-------|------------|------
16 x 16 | 9.13 µs ± 3.55 µs | 9.86 µs ± 81.5 ns | 15.8 µs ± 1.2 µs
16 x 1024 | 15 µs ± 224 ns | 9.48 µs ± 88.7 ns | 88.7 µs ± 8.83 µs
1024 x 16 | 852 µs ± 6.03 µs | 7.84 ms ± 6.22 µs | 4.7 ms ± 166 µs
1024 x 1024 | 34.1 ms ± 803 µs | 11.5 ms ± 6.24 µs | 273 ms ± 6.7 ms
2048 x 2048 | 261 ms ± 3.5 ms | 77.5 ms ± 41.5 µs | 2.5 s ± 97.6 ms
4096 x 4096 | 2.37 s ± 154 ms | 636 ms ± 2.97 µs | 25.9 s ± 394 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11102

Differential Revision: D9697305

Pulled By: erikbrinkman

fbshipit-source-id: 2b4f4b816c02b3715a85d8db3f4e77479d19bb99
2018-09-07 09:09:46 -07:00
110191e5c7 Remove detach from TensorImpl, handle via Type. (#11337)
Summary:
This is so that TensorImpl does not have to depend on Tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11337

Differential Revision: D9684421

Pulled By: gchanan

fbshipit-source-id: d2af93420ca6d493429c251cfe5a34e9289c4484
2018-09-07 08:55:59 -07:00
52b37d8b66 Move VariableHooksInterface to ATen/core (#11273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11273

This one might strike you as a bit surprising, but it's necessary
to expose this interface in ATen/core, because we need to be
able to get a true Variable type from Variable tensors, and
to do that we need to go through the hooks interface.

Reviewed By: gchanan

Differential Revision: D9656548

fbshipit-source-id: 28bb5aee6ac304e8cd5fa1e4c65452c336647161
2018-09-07 08:11:53 -07:00
396e64fff7 Move ATen/Registry.h to ATen/core/Registry.h (#11270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11270

Still need to deduplicate this with caffe2/core/registry.h,
but this will be a bit tricky because the current formulation
of the macro is namespace sensitive (i.e., the macro for classes
defined in at:: namespace won't work if you call from caffe2::
namespace).

Reviewed By: gchanan

Differential Revision: D9654871

fbshipit-source-id: 2207d1f2cc6d50bd41bf64ce0eb0b8523b05d9d9
2018-09-07 08:11:52 -07:00
b02b125d16 Rename getMaybeVariableType back to getType. (#11250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11250

```
codemod -d . --extensions cc,cpp,cu,cuh,h getMaybeVariableType getType
```

Reviewed By: gchanan

Differential Revision: D9648830

fbshipit-source-id: 6b2ac2b1c265ae47722390e6e7f106653077d851
2018-09-07 08:11:50 -07:00
68371b6d2e fast code path when partition=1 which makes LengthsPartition a simple copy (#11351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11351

When partitions == 1 (InputSize() == OutputSize()), LengthsPartition becomes just a copy.

Reviewed By: aazzolini

Differential Revision: D9693409

fbshipit-source-id: a9ea034d227af357b661477ab779a71600f58f58
2018-09-07 08:11:49 -07:00
da4ebc2971 Switch SVD on CPU from gesvd to gesdd (#11194)
Summary:
- Added a note to the doc string for `svd`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11194

Differential Revision: D9683250

Pulled By: soumith

fbshipit-source-id: 2d2c120be346122afa333629c0516a5c9dbb406f
2018-09-07 07:39:57 -07:00
f9595e756e typo/grammar fixes (#11344)
Summary:
Fixes some minor grammar issues in the code base.

PS: I was actually looking for the following one but couldn't find it via grepping in this repo:

![screen shot 2018-09-06 at 3 27 39 pm](https://user-images.githubusercontent.com/5618407/45184280-1e16a980-b1ec-11e8-9cb1-87a96738bdd1.png)

Any idea in which file this issue is raised?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11344

Differential Revision: D9696454

Pulled By: soumith

fbshipit-source-id: 8ffe494b1bf1efb0e35563381d9da2e1e8032a3c
2018-09-06 21:57:14 -07:00
a2afad2b69 Improves ATen CUDAEvent (#11293)
Summary:
After submitting PR #9726, PR #10581 created a different CUDAEvent class. The CUDAEvent proposed in #9726 was similar to the c10d::CUDAEvent class with additional testing and functionality. In particular, it was movable but not copyable. The CUDAEvent created by #10581 is refcounted and copyable. This PR retains the refcounting of the latter PR while fixing several bugs, adding tests, and extending the functionality to support testing and usage like in PR #8354. In particular, this PR:

- Adds set_device() to CUDAContext
- Adds three CUDAEvent tests to stream_test.cpp
- Fixes three bugs:
- Refcounting was broken. Destroying an of the RAIIs holding a particular CUDAEvent would destroy the event UNLESS it was the last RAII (the check was backwards).
- Moving an event would cause a segfault.
- Events were not destroyed on the device they were created on. See PR #9415 (pietern)
- Adds the happened() and recordOnce() functions
- Changes the record() functions to not be const
- Adds additional assertions to verify correctness

This PR does not:

- Make c10d use the ATen CUDAEvent (this is appropriate for a separate PR)

Whether events should be refcounted is an interesting question. It adds some atomic operations and makes event creation eager. Making events movable but not copyable (like the c10d events) avoids these costs and allows events to be lazily constructed. Lazy construction is preferable when working with containers (like std::array or std::vector) and because the event's device can be set automatically to the first stream it's recorded on. With eager construction the user is required to understand that events have a device and acquire the device of the stream the event will be recorded on upfront. This can be seen here:

542aadd9a7/aten/src/ATen/native/cudnn/RNN.cpp (L1130-L1132)

and that file is the only one which currently uses the ATen CUDAEvent.

Refcounting does allow single writer multi-reader scenarios, although these scenarios can be also be supported by providing indirect access to the underlying CUDAEvent. I believe all current and planned usage scenarios do not require refcounting, and if desired I can update this PR to remove refcounting and make the ATen event movable but not copyable like the c10d event. I think not refcounting is preferable because it can improve performance, ease usability, and simplify the code (as seen with two of the above bugs).

I have decided to separate this from PR #8354 since while it's required for PR #8354 the changes are, clearly, of independent interest. PR #8354 has a new dependency on this one, however. I am closing PR #9726 in favor of this PR.

apaszke ezyang pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11293

Differential Revision: D9665836

Pulled By: soumith

fbshipit-source-id: a1513fa4f9761e2f304d126e402f6b6950e1c1d2
2018-09-06 21:39:44 -07:00
b3b1e7624d Optional expand=True kwarg in distribution.enumerate_support (#11231)
Summary:
This adds an optional `expand=True` kwarg to the `distribution.expand_support()` method, to get a distribution's support without expanding the values over the distribution's `batch_shape`.
 - The default `expand=True` preserves the current behavior, whereas `expand=False` collapses the batch dimensions.

e.g.
```python
In [47]: d = dist.OneHotCategorical(torch.ones(3, 5) * 0.5)

In [48]: d.batch_shape
Out[48]: torch.Size([3])

In [49]: d.enumerate_support()
Out[49]:
tensor([[[1., 0., 0., 0., 0.],
         [1., 0., 0., 0., 0.],
         [1., 0., 0., 0., 0.]],

        [[0., 1., 0., 0., 0.],
         [0., 1., 0., 0., 0.],
         [0., 1., 0., 0., 0.]],

        [[0., 0., 1., 0., 0.],
         [0., 0., 1., 0., 0.],
         [0., 0., 1., 0., 0.]],

        [[0., 0., 0., 1., 0.],
         [0., 0., 0., 1., 0.],
         [0., 0., 0., 1., 0.]],

        [[0., 0., 0., 0., 1.],
         [0., 0., 0., 0., 1.],
         [0., 0., 0., 0., 1.]]])

In [50]: d.enumerate_support().shape
Out[50]: torch.Size([5, 3, 5])

In [51]: d.enumerate_support(expand=False)
Out[51]:
tensor([[[1., 0., 0., 0., 0.]],

        [[0., 1., 0., 0., 0.]],

        [[0., 0., 1., 0., 0.]],

        [[0., 0., 0., 1., 0.]],

        [[0., 0., 0., 0., 1.]]])

In [52]: d.enumerate_support(expand=False).shape
Out[52]: torch.Size([5, 1, 5])
```

**Motivation:**
 - Currently `enumerate_support` builds up tensors of size `support + batch_shape + event_shape`, but the values are *repeated* over the `batch_shape` (adding little in the way of information). This can lead to expensive matrix operations over large tensors when `batch_shape` is large (see, example above), often leading to OOM issues. We use `expand=False` in Pyro for message passing inference. e.g. when enumerating over the state space in a Hidden Markov Model. This creates sparse tensors that capture the markov dependence, and allows for the possibility of using optimized matrix operations over these sparse tensors. `expand=True`, on the other hand, will create tensors that scale exponentially in size with the length of the Markov chain.
 - We have been using this in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py) of `torch.distributions` in Pyro. The interface has been stable, and it is already being used in a few Pyro algorithms. We think that this is more broadly applicable and will be of interest to the larger distributions community.

cc. apaszke, fritzo, alicanb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11231

Differential Revision: D9696290

Pulled By: soumith

fbshipit-source-id: c556f8ff374092e8366897ebe3f3b349538d9318
2018-09-06 21:39:42 -07:00
c59c1a25b2 diagnose option: get_entry to print a whole row (#11308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11308

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11299

Reviewed By: xianjiec

Differential Revision: D9652844

fbshipit-source-id: 650d550317bfbed0c1f25ae7d74286cfc7c3ac70
2018-09-06 21:26:30 -07:00
2946b021e3 Disable flaky test, see #11360 (#11361)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11361

Reviewed By: yf225

Differential Revision: D9696524

Pulled By: ezyang

fbshipit-source-id: f6801d6f4f34090d467b16810db9cf576d5d519b
2018-09-06 20:40:00 -07:00
3149a72c63 Move TensorOptions.cpp to the correct place in ATen/core (#11244)
Summary:
This actually ended up being a lot more involved than I thought. The basic
problem is that in some of our build environments, thread local state is not
supported. The correct way to test if this is the case is using the
(undocumented) CAFFE2_FB_LIMITED_MOBILE_CAPABILITY macro.

On mobile, OptionGuard is not available, and you have to do everything
by hand. There's a static_assert to check if you accidentally use
OptionGuard in this case and give you a better error message in this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11244

Reviewed By: gchanan

Differential Revision: D9646190

fbshipit-source-id: cf4016f79b47705a96ee9b6142eb34c95abb2bd4
2018-09-06 20:11:39 -07:00
c45607f77f Static assert GetMutable is not passed with Tensor argument (#11323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11323

If you do pass it this, you'll get a pointer to
UndefinedTensor; probably not what you want!

Reviewed By: Yangqing

Differential Revision: D9676205

fbshipit-source-id: 0bd3c22c2c40ac2958f95fc7a73b908af291cf22
2018-09-06 20:11:37 -07:00
0f419abf40 Roll nomnigraph build into caffe2 (#11303)
Summary:
We need to remove nomnigraph from the list of public libraries in order to support libtorch extensions. Easiest way to do this is to include it into the Caffe2 source like all other caffe2/core/ code.

However, because the headers are in a different place, we need to include them for linked libraries (pybind, tests, etc).

On an upside, this means that nomnigraph is now default hidden visibility too.

FYI peterjc123 xkszltl goldsborough bwasti Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11303

Reviewed By: pjh5

Differential Revision: D9694932

Pulled By: orionr

fbshipit-source-id: 5db3eb20bc5ddc873ce9151236b74663fbb33ed8
2018-09-06 19:38:09 -07:00
9de2085806 Use custom hcc/HIP, purge hcSPARSE (#11198)
Summary:
* purge hcSPARSE now that rocSPARSE is available
* integrate a custom hcc and HIP
* hcc brings two important compiler fixes (fixes hundreds of unit tests)
* HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch)
* mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault)
* optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198

Differential Revision: D9652340

Pulled By: ezyang

fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690
2018-09-06 19:38:07 -07:00
ec5404a449 Add cuda version of SpatialBNOp also optimize SpatialBN on CPU (#10888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888

Add cuda version of SpatialBNOp also optimize SpatialBN on CPU

Reviewed By: houseroad

Differential Revision: D9512435

fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1
2018-09-06 18:26:13 -07:00
7726b36489 Full-fledged group testings and fixes for c10d frontend APIs (#11318)
Summary:
Fixed a few bugs that were not tested in the c10d frontend APIs, including
get_rank, get_world_size, and destroy_process_group of a given group.

These APIs are added to the CI tests.

Also added all the group related tests, including full-group, and partial groups (existing ones), since both will hit different code paths.

Also removed experimental APIs for c10d initially used in DDP, now we don't use it anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11318

Reviewed By: pietern

Differential Revision: D9675896

Pulled By: teng-li

fbshipit-source-id: a2eac2c57933effa2d139855f786e64919a95bfc
2018-09-06 18:26:11 -07:00
1a01c75dde support gradClipping per blob in mtml (#10776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10776

as title

Reviewed By: chocjy

Differential Revision: D9458099

fbshipit-source-id: f840d4f1542e8180f41cc0732c8468fa43805ab8
2018-09-06 18:10:52 -07:00
c39216f8c4 Automatic update of fbcode/onnx to bff0b8835870c7df7762ef43498d000d2d8ffb52 (#11346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11346

Previous import was 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c

Included changes:
- **[bff0b88](https://github.com/onnx/onnx/commit/bff0b88)**: Add DynamicSlice experimental op (#1377) <James Reed>
- **[91a7b8e](https://github.com/onnx/onnx/commit/91a7b8e)**: statCoverage(model) (#1246) <Akshay Chalana>
- **[36643c6](https://github.com/onnx/onnx/commit/36643c6)**: fix the doc for softmax (#1374) <Lu Fang>
- **[8c64acd](https://github.com/onnx/onnx/commit/8c64acd)**: Silence usused result warning in ONNXIFI wrapper cleanup. Fix #1344 (#1371) <Marat Dukhan>
- **[53b20f6](https://github.com/onnx/onnx/commit/53b20f6)**: Add the ability to deprecate an OpSchema (#1317) <Ryan Hill>
- **[8aec4e2](https://github.com/onnx/onnx/commit/8aec4e2)**: [Anderspapitto patch] fix the shape inference for broadcasting (#1368) <Lu Fang>

Reviewed By: jamesr66a

Differential Revision: D9691533

fbshipit-source-id: 6aff6ce04ade37182e2ffe9bc83eb86846bc722d
2018-09-06 17:39:57 -07:00
4d678790c5 enable advanced indexing with tensors (#10862)
Summary:
On the way to #10774

This PR adds advanced indexing with tensors.
The approach is to desugar advanced indexing into an at::index op.
This is exactly how normal pytorch does it.
[(I used this code as reference)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp)

Supporting sequences is a little tricky because JIT script doesn't have
an easy way to turn arbitrary n-dimensional python lists into a tensor
(it would be easy if we supported `torch.tensor`), so that'll come
in a future PR.

cc jamesr66a zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10862

Differential Revision: D9659449

Pulled By: zou3519

fbshipit-source-id: 56d293720d44c0fd27909e18327ab3985ddfced6
2018-09-06 16:41:45 -07:00
148f7cc47a nomnigraph - nit - fix generated code to be consistent with style (#11343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11343

make the generated classes (OpClasses.h...) consistent with fb cpp code style

Reviewed By: yinghai

Differential Revision: D9689487

fbshipit-source-id: 450e742d2462115d1bf41b9ea88d20df0a842b2b
2018-09-06 16:27:17 -07:00
49231ab0a8 Reimplement storage slicing. (#11314)
Summary:
In #9466 I got rid of storage views and eliminated all places where
they were used... OR SO I THOUGHT.  In actuality, under certain
conditions (specifically, if you trained a CUDA multiprocessing model
shared over CUDA IPC and then serialized your parameters), you could
also serialize storage slices to the saved model format.  In #9466,
I "fixed" the case when you loaded the legacy model format (really,
just unshared the storages--not strictly kosher but if you aren't
updating the parameters, shouldn't matter), but NOT the modern model format, so
such models would fail.

So, I could have applied the legacy model format fix too, but
hyperfraise remarked that he had applied a fix that was effectively
the same as unsharing the storages, but it had caused his model to
behave differently.  So I looked into it again, and realized that
using a custom deleter, I could simulate the same behavior as old
storage slices.  So back they come.

In principle, I could also reimplement storage views entirely using
our allocators, but I'm not going to do that unless someone really
really wants it.

Fixes #10120.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11314

Reviewed By: ailzhang

Differential Revision: D9671966

Pulled By: ezyang

fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce
2018-09-06 16:11:59 -07:00
1d406c04ae fix comment on Cost params_bytes (#11190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11190

As discussed with Alexander Sidorov, params_bytes refer to the number of bytes we're reading for parameters, not the size of parameters. They only differ in sparse operators.

Reviewed By: mdschatz

Differential Revision: D9628635

fbshipit-source-id: 9e2aed0cf59388928dc69b8534cf254f0347c9c8
2018-09-06 15:12:22 -07:00
68613cf5a2 Windows DLL build with Caffe2 code (#11266)
Summary:
This is an experimental build on top of what orionr and mingzhe09088 built.

Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266

Reviewed By: orionr

Differential Revision: D9682942

Pulled By: Yangqing

fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3
2018-09-06 15:12:20 -07:00
34c0043aae Force third_party Eigen from setup.py (#11334)
Summary:
We shouldn't use system Eigen in any cases when building with setup.py. If people want to use system Eigen (not from third_party) they can build with CMake for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11334

Reviewed By: pjh5

Differential Revision: D9689450

Pulled By: orionr

fbshipit-source-id: baf616b9f195692942151ad201611dcfe7d927ba
2018-09-06 14:56:53 -07:00
03ca7358af Add unit test for Parallel Spatial Batch Normalization (#11098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11098

Added a test for testing CPU version across multiple devices.

Reviewed By: enosair, BIT-silence

Differential Revision: D9584520

fbshipit-source-id: 0d8c85e6d402bc7b34d5f8f16ef655ff9b61b49e
2018-09-06 14:26:56 -07:00
5712fe3297 Fix out-of-boundary conversion issue (#11338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11338

The `min_` and `max_` value of the filler is in `double` format but when we are filling a specific type of tensor, their value can exceed the type limits, resulting in crash. This diff checks the type limits first and if `min_`/`max_` is out of the limits, it will clip it.

Reviewed By: highker

Differential Revision: D9684455

fbshipit-source-id: 6da98a03c57f3296abaddc7c5cfc1c836c611eb0
2018-09-06 13:39:52 -07:00
ec195129ec Adding setTimeout option in Store (#11265)
Summary:
This will allow users to set customized timeout option for the store.

Tested by my own debug print to make sure that C++ actually used the timeout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11265

Differential Revision: D9666164

Pulled By: teng-li

fbshipit-source-id: 4eb6441783da106a3fd59b95457e503e83e4640f
2018-09-06 12:55:50 -07:00
fef52cc1f8 Add resolver for 'torch' module (#10847)
Summary:
This lets you compile builtin functions from C++ without having a dependence on Python

```cpp
auto module = torch::jit::compile(JIT"(
def my_script_method(x, y):
    return torch.relu(x) + y
)");
IValue result = module->run_method("my_script_method", 1, 2);
```

goldsborough zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10847

Differential Revision: D9543461

Pulled By: driazati

fbshipit-source-id: 6160dae094030ca144a0df93cb9f26aa78c8cf27
2018-09-06 12:42:21 -07:00
0f1ec07c57 nomnigraph - nit - rename unit test files (#11315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11315

Rename unit tests file to make it consistent with fb cpp style guideline "The unittest for MyFoo.cpp should be named MyFooTest.cpp."

Reviewed By: yinghai

Differential Revision: D9671519

fbshipit-source-id: 44ed6794f6e479d190916db8064eee692e3ad876
2018-09-06 12:28:18 -07:00
ed8849b640 Add include path to Doxygen preprocessing and add some documentation (#11313)
Summary:
1. Add documentation to Linear and improve documentation for RNNs
2. Fix preprocessing in C++ docs by adding correct include path
3. Make myself and ebetica codeowner of docs/cpp to improve development speed

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11313

Differential Revision: D9683615

Pulled By: goldsborough

fbshipit-source-id: 84ea32f9ea6b4060744aabbf5db368776a30f0b5
2018-09-06 12:28:17 -07:00
f98bd53b01 Small fix to the UniformIntFill tensor shape and type inference.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11028

Reviewed By: salexspb

Differential Revision: D7715107

Pulled By: costin-eseanu

fbshipit-source-id: a4f73d53c0192b9826451b4bba4ab0992abbb1a2
2018-09-06 12:11:32 -07:00
1ad61a18b2 Rename cuda tests to have 'cuda' in their names (#11332)
Summary:
Not a lot changed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11332

Differential Revision: D9683680

Pulled By: zou3519

fbshipit-source-id: 95f444e54049dd268fc10effe425ef2df79c6467
2018-09-06 11:57:52 -07:00
0ef2b318a2 fix empty net type (#11286)
Summary:
Turns out that '' net.type is not acceptable to CreateNet.

But empty net.type is acceptable.

Fix that in this diff. Also this is related to T33613083
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11286

Reviewed By: Maratyszcza, wat3rBro

Differential Revision: D9659920

Pulled By: harouwu

fbshipit-source-id: d68f24b754e18e1121f029656d885c48ab101946
2018-09-06 11:10:01 -07:00
936bba77d1 cudnn 7 upgrade with spatialBN fix (#11291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11291

In S163230, we've found CuDNN 7 upgrade causes accuracy drop in training convolution network such as ResNeXt-101 (~0% accuracy), and video R(2+1)D (65 --> 63%).

Our current theory for this accuracy loss is because of the new "CUDNN_BATCHNORM_SPATIAL_PERSISTENT" in spatialBN operator. In Caffe 2, we've made this mode as default. According to CuDNN manual (https://fburl.com/z996mr13), this mode may introduce some limitation in the input data range and cause overflow (which outputs NaN). NaN is probably not the case, because we're seeing a few percent of accuracy drop but not gradient explosion or failure. However, this "performance-optimized" code path may introduce accuracy loss (which is not caught by our unit test case because the input data range is [-0.5-0.5].

Reviewed By: kuttas, stephenyan1231

Differential Revision: D9601217

fbshipit-source-id: 73c2690c19cb1f02ea4e5e2200f50128df4f377b
2018-09-06 10:11:59 -07:00
4ae95738b2 Ignore FuseGraph Call on Windows (#11015)
Summary:
Fusion is NYI implemented on Windows, so ignore FuseGraph call instead of failing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11015

Differential Revision: D9619121

Pulled By: eellison

fbshipit-source-id: ad09aeaa41b7fdeb9ca7bf5e1c166923ca405b15
2018-09-06 09:54:51 -07:00
a853a74217 defer resolution of mkl to a cmake wrapper library (#11298)
Summary:
this is a fix that's needed for building extensions with a
pre-packaged pytorch. Consider the scenario where

(1) pytorch is compiled and packaged on machine A
(2) the package is downloaded and installed on machine B
(3) an extension is compiled on machine B, using the downloaded package

Before this patch, stage (1) would embed absolute paths to the system
installation of mkl into the generated Caffe2Config.cmake, leading to
failures in stage (3) if mkl was not at the same location on B as on
A. After this patch, only a reference to the wrapper library is
embedded, which is re-resolved on machine B.

We are already using a similar approach for cuda.

Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298

Differential Revision: D9683150

Pulled By: anderspapitto

fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4
2018-09-06 09:10:39 -07:00
dda8402447 Cleanup dependency of distributed flags (#11221)
Summary:
Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set.

cc pietern The controller you requested could not be found. cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221

Reviewed By: Yangqing

Differential Revision: D9664267

Pulled By: orionr

fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73
2018-09-06 08:56:00 -07:00
68930c48cf Move minimal wrapdim functionality to core, remove THTensor include i… (#11283)
Summary:
…n TensorImpl.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11283

Reviewed By: ezyang

Differential Revision: D9660015

Pulled By: gchanan

fbshipit-source-id: 263cba226d9ee981d55281c94e6fda5842a46b02
2018-09-06 08:10:33 -07:00
f6568b00f5 Change includes from ATen/Storage.h to ATen/core/Storage.h (#11217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11217

```
codemod -d . --extensions cc,cpp,cu,cuh,h 'ATen/Storage.h' 'ATen/core/Storage.h'
```

Reviewed By: gchanan

Differential Revision: D9634904

fbshipit-source-id: 35a177733f3816e32d8748513c9caa4cf13a6896
2018-09-06 08:10:30 -07:00
656e81db93 Fix scalar tensor assert in fusion compiler (#10952)
Summary:
Fixes #8560.
Unblocks #10715.

The assert (nDim <= uncompressedDims) was being triggered for a scalar
tensor because we compute nDim to be 1 for a scalar tensor but
uncompressedDim = 0.

This PR changes it so that we compute nDim to be 0 for a scalar tensor. This
works because indexing in a kernel depends on nDim. If nDim = 0, then
offset is always 0, which is what we want.

Some other (small) changes were necessary to make this work:
- One cannot define a 0-length array `IndexType arr[0]` so the code
  guards against that
- Needed to change some of the maxTensorInfoSize logic to handle the
  case when uncompressedDim == 0.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10952

Differential Revision: D9544607

Pulled By: zou3519

fbshipit-source-id: 2b873f47e2377125e1f94eb1b310a95cda51476c
2018-09-06 07:54:57 -07:00
bb7d1837bc Add dead code elimination pass (#10101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10101

Simple DCE enabled by knowledge of the actual outputs (stacked beneath this diff)

Reviewed By: yinghai

Differential Revision: D9107853

fbshipit-source-id: 0c38fe5fe408be2b7fc9e1fe6a5b7160c06ce79b
2018-09-05 23:55:17 -07:00
220c9e52b9 Distributed Data Parallel CPU module for C10D (#11168)
Summary:
Distributed Data Parallel CPU module for c10d. This is basically the same code as Distributed Data Parallel CPU module for THD, since c10d now has the exact same front-end interface as torch.distributed.

We will keep both in the first release and remove the THD one once c10d is stable enough.

Test fully covered just as THD too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11168

Differential Revision: D9674963

Pulled By: teng-li

fbshipit-source-id: ecf52a7189374ca7930c2be305218167fdd822a7
2018-09-05 21:59:31 -07:00
126ac4b71f Back out "[pt1][tensor] Add strides to caffe2::Tensor"
Summary: Original commit changeset: 3643871b70f1

Differential Revision: D9665958

fbshipit-source-id: 46e22adbf39af92fb23abb66212991bd53a86317
2018-09-05 20:39:07 -07:00
fb836db4b2 Fix conv gradient conversion (#11312)
Summary:
Fix Windows build failure after https://github.com/pytorch/pytorch/pull/10744 landed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11312

Reviewed By: mingzhe09088

Differential Revision: D9669907

Pulled By: orionr

fbshipit-source-id: d717ec4f8fdf17acf334528d7838b88c5c50e9c3
2018-09-05 20:09:31 -07:00
dccd0f2de6 Bag of clang tidy fixes for torch/csrc/ and torch/csrc/autograd (#11050)
Summary:
Linting `torch/csrc/` (non-recursive) and `torch/csrc/autograd` (non-recursive).

Fixed things like:
- `typedef` vs `using`
- Use `.empty()` instead of comparing with empty string/using `.size() == 0`
- Use range for loops instead of old style loops (`modernize-`)
- Remove some `virtual` + `override`
- Replace `stdint.h` with `cstdint`
- Replace `return Type(x, y)` with `return {x, y}`
- Use boolean values (`true`/`false`)  instead of numbers (1/0)
- More ...

ezyang apaszke cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11050

Differential Revision: D9597505

Pulled By: goldsborough

fbshipit-source-id: cb0fb4793ade885a8dbf4b10484487b84c64c7f2
2018-09-05 19:55:50 -07:00
83a1ab2136 Sparse tensor printing; add NotImplemented autograd fn (#10181)
Summary:
Commits:

1. Add autograd function `NotImplemented` (subclass of `Error`) so python `grad_fn` prints nicer. Since `Error` is used in `DelayedError` to implement `oncedifferentiable`, I can't just change its name. cc colesbury

2. Add printing for sparse tensors. Fixes https://github.com/pytorch/pytorch/issues/9412 . cc weiyangfb The controller you requested could not be found. .

3. Add tests for sparse printing

Examples:
```diff
  In [2]: x = torch.sparse.FloatTensor(torch.arange(4).view(2,2), torch.randn(2, 2), [10, 10, 2])

  In [3]: x
  Out[3]:
- torch.sparse.FloatTensor of size (10,10,2) with indices:
- tensor([[0, 1],
-         [2, 3]])
- and values:
- tensor([[-1.1832, -0.5927],
-         [ 0.0831,  0.2511]])
+ tensor(indices=tensor([[0, 1],
+                        [2, 3]]),
+        values=tensor([[ 1.5081,  0.3451],
+                       [-0.0392,  0.4776]]),
+        size=(10, 10, 2), nnz=2, layout=torch.sparse_coo)

  In [4]: x.requires_grad_()
  Out[4]:
- torch.sparse.FloatTensor of size (10,10,2) with indices:
- tensor([[0, 1],
-         [2, 3]], grad_fn=<Error>)
- and values:
- tensor([[-1.1832, -0.5927],
-         [ 0.0831,  0.2511]], grad_fn=<Error>)
+ tensor(indices=tensor([[0, 1],
+                        [2, 3]]),
+        values=tensor([[ 1.5081,  0.3451],
+                       [-0.0392,  0.4776]]),
+        size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, requires_grad=True)

  In [5]: x + x
  Out[5]:
- torch.sparse.FloatTensor of size (10,10,2) with indices:
- tensor([[0, 1],
-         [2, 3]], grad_fn=<Error>)
- and values:
- tensor([[-2.3664, -1.1855],
-         [ 0.1662,  0.5021]], grad_fn=<Error>)
+ tensor(indices=tensor([[0, 1],
+                        [2, 3]]),
+        values=tensor([[ 3.0162,  0.6902],
+                       [-0.0785,  0.9553]]),
+        size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, grad_fn=<AddBackward0>)

  In [6]: x.double()
  Out[6]:
- torch.sparse.DoubleTensor of size (10,10,2) with indices:
- tensor([[0, 1],
-         [2, 3]], grad_fn=<Error>)
- and values:
- tensor([[-1.1832, -0.5927],
-         [ 0.0831,  0.2511]], dtype=torch.float64, grad_fn=<Error>)
+ tensor(indices=tensor([[0, 1],
+                        [2, 3]]),
+        values=tensor([[ 1.5081,  0.3451],
+                       [-0.0392,  0.4776]]),
+        size=(10, 10, 2), nnz=2, dtype=torch.float64, layout=torch.sparse_coo,
+        grad_fn=<NotImplemented>)

  In [7]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2, 0), [0])

  In [8]: x
  Out[8]:
- torch.sparse.FloatTensor of size (0,) with indices:
- tensor([], size=(0, 2), dtype=torch.int64)
- and values:
- tensor([], size=(2, 0))
+ tensor(indices=tensor([], size=(0, 2)),
+        values=tensor([], size=(2, 0)),
+        size=(0,), nnz=2, layout=torch.sparse_coo)

  In [9]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2), [])

  In [10]: x
  Out[10]:
- torch.sparse.FloatTensor of size () with indices:
- tensor([], size=(0, 2), dtype=torch.int64)
- and values:
- tensor([-0.0064,  0.8518])
+ tensor(indices=tensor([], size=(0, 2)),
+        values=tensor([ 0.9800, -0.5978]),
+        size=(), nnz=2, layout=torch.sparse_coo)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10181

Differential Revision: D9139845

Pulled By: SsnL

fbshipit-source-id: 353eebd55fac4049ed9bf85f8b0ee2c1418a744e
2018-09-05 19:41:22 -07:00
fa147abda4 Add convertToCaffe2Proto to python API
Summary: Closing the gap a bit on API, allowing users to go NetDef -> nomnigraph -> NetDef in python now

Reviewed By: duc0

Differential Revision: D9670495

fbshipit-source-id: 6497518ffc05a186deb0d657e06317980d39ddd5
2018-09-05 18:40:48 -07:00
425ea6b31e fix doc for functional.dropout* (#10417)
Summary:
- fixes #4177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10417

Differential Revision: D9542876

Pulled By: weiyangfb

fbshipit-source-id: 480ed973d1fe0364f4acb5cd596c2031895b82df
2018-09-05 17:26:00 -07:00
ad116210e5 typo fix Tranpose2D -> Transpose2D (#11281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11281

A simple typo fix

Reviewed By: BIT-silence

Differential Revision: D9658324

fbshipit-source-id: b6513c8d12d8fe75a9b18df1b443e9e66e692744
2018-09-05 17:25:58 -07:00
a9d8b021e9 Remove THFinalizer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11287

Reviewed By: ezyang

Differential Revision: D9662341

Pulled By: cpuhrsch

fbshipit-source-id: 306bea00694db1ae207167ee4bf10de01426911c
2018-09-05 16:56:27 -07:00
c0efe6f027 Forward declarations of needed curand functions (#10911)
Summary:
Needed for FULL_CAFFE2=1 with statically linked CUDA libraries. Waiting on advice from Nvidia
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10911

Reviewed By: pjh5

Differential Revision: D9636256

Pulled By: orionr

fbshipit-source-id: fcad7945910b6c8fb5f52e81cc87dad5fcfb3c65
2018-09-05 16:56:26 -07:00
57728f71e7 nomnigraph - simplify core graph API and test (#11256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11256

- in deleteNode method, remove optional deleteEdge flag as it's not used
- in deleteEdge method, remove optional removeRef flag as it's not used
- in replaceNode method, remove optional newHead_ parameter as it's not used - also simplifying the implementation by just calling replaceInEdges and replaceOutEdges
- remove importNode & importEdge as they're not in used
- add getEdgeIfExists that is like getEdge() but returns nullptr instead of throwing when the edge does not exist
- reduce verbosity in the basic graph unit test and add more test cases for ReplaceEdges

Differential Revision: D9650913

fbshipit-source-id: 6c18b37bef0d2abe1b57fb4fc47bfdbcee387694
2018-09-05 16:40:49 -07:00
c43187291c Small fixes to cppdocs for sync script (#11300)
Summary:
I'm setting up an automatic sync job for cppdocs and need two fixes to the cpp docs config:

1. Right now the cppdocs use the `torch` package to figure out the version. For C++ docs all I really need from the built package are the generated Tensor.h and Functions.h files. I can actually generate those directly via `aten/src/ATen/gen.py`, so I can skip building PyTorch altogether and save 10 minutes in the sync job! For this I need to avoid using the torch package in the docs.
2. Internal proxy issues prevent using the git link for sphinx_rtd_theme. We can just use the pip package for the cppdocs (not for the normal PyTorch docs)

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11300

Differential Revision: D9667193

Pulled By: goldsborough

fbshipit-source-id: 5567e0b3d3bdce03f5856babdb4ff76bcee91846
2018-09-05 16:40:47 -07:00
c9e66351a7 Port all PyTorch and Caffe2 jobs to CircleCI (#11264)
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.

Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect

Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264

Differential Revision: D9656793

Pulled By: yf225

fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
2018-09-05 16:28:11 -07:00
9f4bcdf075 caffe2::DeviceType -> at::DeviceType (#11254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
  explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
    static_assert(d < MaxDeviceTypes, "");
    Event::event_creator_[d] = f;
  }
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):

    1. caffe2::DeviceType → caffe2::DeviceTypeProto
    2. caffe2::CPU → caffe2::PROTO_CPU
    3. caffe2::DeviceType = at::DeviceType
    4. caffe2::CPU = at::DeviceType::CPU

codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes

In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.

Reviewed By: ezyang

Differential Revision: D9545704

fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
2018-09-05 16:28:09 -07:00
ac9f0a6884 refactor preproc, support dense in TumHistory layer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11131

Reviewed By: xianjiec

Differential Revision: D9358415

fbshipit-source-id: 38bf0e597e22d540d9e985ac8da730f80971d745
2018-09-05 16:10:13 -07:00
3e85685f8f add persistent rnns with conservative criteria (#11248)
Summary:
Persistent rnns provide much better performance on V100 with half input data for a variety of cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11248

Differential Revision: D9665687

Pulled By: ezyang

fbshipit-source-id: 2bd09a7eb1f5190aadb580977b0ba956e21a7dd5
2018-09-05 16:10:11 -07:00
68c2e014cb Handling for py2/py3 division differences (#11016)
Summary:
- In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if
  `from __future__ import division` is not imported in the file.
- The / operator is universally set to do "true" division for integers
- Added a `prim::FloorDiv` operator because it is used in loop unrolling.

The error if users use '/' in python 2 without importing from __future__
occurs when building the JIT AST.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016

Differential Revision: D9613527

Pulled By: zou3519

fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6
2018-09-05 14:57:38 -07:00
9a0effb92c Update send/recv tests to reflect intended use (#11275)
Summary:
The existing tests had every rank run send to every other rank and only
then switch to recv mode. This only works if the send operations are
non-blocking and the passed tensors are immediately copied to some kind
of send buffer. Instead, every send must be matched with a recv on the
other side, because from the API perspective they may block.

E.g. imagine a 1GB tensor being sent to every other rank. It can only go
through if there is a recv on the other side, or it will deadlock.

This change reflects this in the send/recv unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11275

Differential Revision: D9658197

Pulled By: pietern

fbshipit-source-id: fb6a3fc03b42343a9dfeed0def30d94914e76974
2018-09-05 14:40:04 -07:00
8da081f7a5 Add cost inference to ConvGradient and WeightedSum operators (#10744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10744

As title

Reviewed By: jspark1105

Differential Revision: D9436387

fbshipit-source-id: 578b7a6d98843d57e3f8f4c564727e9cadbedd78
2018-09-05 13:56:05 -07:00
4fe3356ee0 Move collapse dims into a single place (#11272)
Summary:
Deduplicates implementations and reduces sources of failure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11272

Differential Revision: D9659167

Pulled By: cpuhrsch

fbshipit-source-id: 759bfba4fd90795038afe684d9829f5f41f98109
2018-09-05 12:57:00 -07:00
5e2067ce30 Fix some more warnings (#11257)
Summary:
Found these when compiling the new master with gcc 7.3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11257

Differential Revision: D9656612

Pulled By: SsnL

fbshipit-source-id: 7acb19e13204c010238dab7bc6973cc97b96f9a4
2018-09-05 11:10:27 -07:00
f866574afc Fix the batchnorm onnx exporting when affine=False
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11249

Reviewed By: Ac2zoom

Differential Revision: D9652526

Pulled By: houseroad

fbshipit-source-id: 12a9038beddd227a2f9e2178edf4e8d623488c3e
2018-09-05 11:10:25 -07:00
55212507a2 Improve error message to include return types too (#11245)
Summary:
Fixes #11057.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11245

Differential Revision: D9652698

Pulled By: apaszke

fbshipit-source-id: 4c5006e32e599c35367aa5acfae45de3ab8ac176
2018-09-05 10:56:51 -07:00
e6d6aed12e Check doxygen output in travis (#11124)
Summary:
This PR adds a .travis.yml check for our C++ documentation. The goal is to avoid any documentation/comments in our C++ code that would break the doxygen output and possibly ruin the C++ documentation site (currently https://pytorch.org/cppdocs).

For this, we:
1. Run doxygen and record any warnings,
2. Filter out some known bogus warnings,
3. Count the remaining warnings,
4. Fail the check if (3) is non-zero.

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11124

Differential Revision: D9651011

Pulled By: goldsborough

fbshipit-source-id: 30f776d23bb6d6c482c54db32828b4b99547e87b
2018-09-05 10:25:56 -07:00
267e1ec112 Accept more numpy scalars as doubles (#9659)
Summary:
Allows mulitplication of e.g. numpy.float32 with tensors.

This came up with #9468

If you want this and after the other patch is done, I'll add tests (but that would be conflicting, so I prefer to wait).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9659

Differential Revision: D8948078

Pulled By: weiyangfb

fbshipit-source-id: c7dcc57b63e2f100df837f70e1299395692f1a1b
2018-09-05 10:25:55 -07:00
8bd80a6b74 Fixed log message (#10874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10874

Fixes the log message "WARNING:data_workers:Warning, data loading lagging behind: name=0" where instead of source name the size of a queue is reported

Reviewed By: panshen1, Novitial

Differential Revision: D9506606

fbshipit-source-id: 03717cfa9b991afb335ef877378afa3b52fd8f22
2018-09-05 09:55:52 -07:00
434e943b08 Fix to distribution.__repr__ with lazy attributes (#11263)
Summary:
`__repr__` currently fails for distributions with lazy attributes in PyTorch master, throwing a `KeyError`. This fixes the issue.

**Additionally:**
 - Added `logits` to `arg_constraints` for distributions that accept either `probs` or `logits`. This is both to have `__repr__` display the `logits` param when available, and to be able to do validation checks (e.g. NaN checks) when the logit parametrization is used. fritzo, alicanb - I think there were reasons why we had not done so in the first place, but I am unable to recall now. It passes all the tests, but let me know if there is something that I am missing at the moment.
 - There are certain distributions, e.g. `OneHotCategorical` which won't show any parameters because it uses a `categorical` instance under the hood and neither `logits` / `probs` in `arg_constraints` are present in the instance's `__dict__`. This isn't addressed in this PR.

cc. vishwakftw, fritzo, nadavbh12, apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11263

Differential Revision: D9654959

Pulled By: apaszke

fbshipit-source-id: 16f5b20243fe8e2c13e9c528050d4df0b8ea6e45
2018-09-05 09:55:51 -07:00
9fc22cb772 Add import export step to end to end tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10717

Differential Revision: D9562888

Pulled By: li-roy

fbshipit-source-id: 8f5d62fd0a44aca0a41dc10438e7bb91cc2a972a
2018-09-05 09:39:47 -07:00
1808e368e4 Add complex hooks for out of tree complex implementation. (#11216)
Summary:
This PR adds a hooks interface for registering types for complex
scalar types, and a sample implementation of the hook in
test_cpp_extensions.

The hook registration is patterned off of the existing CUDA hooks.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11216

Differential Revision: D9654840

Pulled By: ezyang

fbshipit-source-id: 7b97646280d584f8ed6e14ee10a4abcd04cf2987
2018-09-05 09:25:50 -07:00
aeb6094538 Unify opt flag for cmake codegen (#11227)
Summary:
Also enables debug for non-MSVC for kernel codegen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11227

Differential Revision: D9656506

Pulled By: cpuhrsch

fbshipit-source-id: 667195cb55de1a1a9042b6b1c4436e9c6c743333
2018-09-05 08:55:49 -07:00
d612855b91 nomnigraph - fix memory error in NN subgraph matchOp (#11127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11127

it's invalid to capture `predicate` by reference as it's a local variable. capture it by value instead.

Differential Revision: D9600115

fbshipit-source-id: 92e0130d0a74908380b75ade5c3492df49e25941
2018-09-05 07:57:40 -07:00
6d6655e6be Port PackedSequences functions to C++ (#11224)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11224

Differential Revision: D9652703

Pulled By: apaszke

fbshipit-source-id: 558e39457e590cad07516e5bb2ecb12789564950
2018-09-05 06:35:15 -07:00
b7038f7c37 Treat numerical differences as warnings instead of errors when tracing (#11246)
Summary:
Also, make `torch.isclose` work with integral tensors and refactor `_check_trace` a bit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11246

Differential Revision: D9652701

Pulled By: apaszke

fbshipit-source-id: fb0bdbfd1952e45e153541e4d471b423a5659f25
2018-09-05 06:35:13 -07:00
b7cd4b692c add a Float16UniformFill (#11123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11123

this adds an operator that fills a tensor with a uniform(min, max)
the implementation is to use the fp32 generator and convert to fp16

if performance becomes an issue we could resort to intrinsics

Reviewed By: jspark1105, chocjy

Differential Revision: D9598142

fbshipit-source-id: 5aeab99acf7c3596fa6c33611d9d2c484f7c1145
2018-09-04 23:28:22 -07:00
d4060d2d0e Implement torch.tensordot (#10025)
Summary:
Fixes: #8988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10025

Reviewed By: ezyang

Differential Revision: D9540967

Pulled By: yf225

fbshipit-source-id: 6ba2a7777162983977db884b693e6f4543b31aeb
2018-09-04 21:10:07 -07:00
d1b920b44f keep net type info when generating model complete net (#11032)
Summary:
keep net type info when generating model complete net. This will keep the performance optimization option
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11032

Reviewed By: wat3rBro

Differential Revision: D9564125

Pulled By: harouwu

fbshipit-source-id: c6546af9b1d4ff5eddf6124e24a5da1b8baf47df
2018-09-04 21:10:06 -07:00
56bdd87b40 Get rid of some uses of type() (#11215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11215

I found these by deleting the implicit conversion of Type to
TensorOptions and then fixing sites.  This isn't a complete
refactor, because I ran out of steam after fixing this many
and decided to keep the implicit conversion.  Still, why
waste a perfectly good refactor?

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9634750

fbshipit-source-id: 4d8fb778e13e6e24b888b1314a02709b2cb00b62
2018-09-04 20:26:22 -07:00
9ca63c5e63 Reorganize methods in Type, add CPUTypeDefault/CUDATypeDefault (#11205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11205

Our short term plan for supporting out of tree complex development requires an
external library to add a custom subclass of Type without access to the
code generation facilities in ATen.  This commit reorganizes Type so
as to minimize the amount of boilerplate you have to write when making
a subclass of Type.

In particular, it:
- Creates a new CPUTypeDefault/CUDATypeDefault class, which you are
  intended to inherit from, which provides default implementations
  of CPU/CUDA that is layout/dtype agnostic.
- Adds new getCPUAllocator() and getCUDAAllocator() functions, as
  a more public API to get your hands on Allocator
- Adds allocator() and getDeviceFromPtr(), abstracting the device
  specific parts of storage() methods; these methods are now
  implemented in base TypeDefault.
- Delete the static typeString() method, which is now dead.
- Move is_cuda/is_sparse/is_distributed to TypeDefault.

Reviewed By: SsnL

Differential Revision: D9631619

fbshipit-source-id: 40b600d99691230e36e03eb56434c351cbc2aa3a
2018-09-04 20:26:20 -07:00
f0d3fda064 Improve docs for torch::nn::Module (#11115)
Summary:
Added some documentation. Will rebuild docs to make sure it looks good. Can already accept approvals.

ebetica apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11115

Differential Revision: D9597880

Pulled By: goldsborough

fbshipit-source-id: 56b701da631702ba56e281a0de0f7ebe490f5c5a
2018-09-04 18:10:38 -07:00
7f74875304 Pull Context out of TensorMethods.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11241

Reviewed By: ezyang

Differential Revision: D9645514

Pulled By: gchanan

fbshipit-source-id: 43e65d1d2fa3183264ed7e4752c1512df5f69175
2018-09-04 18:10:37 -07:00
05cb40dc00 Move some includes from Tensor/Type to core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11234

Reviewed By: ezyang

Differential Revision: D9642669

Pulled By: gchanan

fbshipit-source-id: 2c131bb46b54a0803c37b444ad48d861080056f1
2018-09-04 18:10:34 -07:00
c8672f0b42 Support environments with no libprotobuf (#11161)
Summary:
Just pulling this out of https://github.com/pytorch/pytorch/pull/10611

Make sure we can support environments where we don't have libprotobuf installed when we link-local protobuf.

cc goldsborough Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11161

Differential Revision: D9650282

Pulled By: orionr

fbshipit-source-id: 447b5e54cd2639973b4b10f58590d1c693a988d4
2018-09-04 17:27:54 -07:00
020501b7b0 Getting rid of USE_C10D for build (#11237)
Summary:
Will use USE_DISTRIBUTED for both c10d and THD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237

Differential Revision: D9647825

Pulled By: teng-li

fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969
2018-09-04 17:27:53 -07:00
313e89d8db Fix dimension collapsing (#11226)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11226

Differential Revision: D9646638

Pulled By: cpuhrsch

fbshipit-source-id: 104f367f75a4478bb7580324ea3661de71b2c8b0
2018-09-04 17:27:52 -07:00
6219c4a28f Make Scalar::toTensor a free function, move Scalar to ATen/core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11125

Reviewed By: ezyang

Differential Revision: D9599798

Pulled By: gchanan

fbshipit-source-id: 2fec682c109013a82788dfba13f4d30b2945d3f4
2018-09-04 16:25:57 -07:00
033499cf56 Remove mention of USE_DISTRIBUTED_MW (#11240)
Summary:
This was lingering after #10731.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11240

Differential Revision: D9645437

Pulled By: pietern

fbshipit-source-id: d02c33354b094be3bb0872cf54a45721e20c4e7d
2018-09-04 16:10:20 -07:00
3f30c296d3 Export CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_* (#11233)
Summary:
This PR resolved the following compilation errors on devgpu:
/home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_Tan()'
/home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_MaxPool3D()'
....

The same error has been happening with caffe2 build with debug mode before build_caffe2 was removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11233

Reviewed By: orionr

Differential Revision: D9645527

Pulled By: mingzhe09088

fbshipit-source-id: 68a45aa7fd815cac41b7fd64cfd9838b3226345a
2018-09-04 14:56:43 -07:00
7e0a052a5d Adding synthetic data generation to the filler.h file (#11060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11060

Adding synthetic data generation to the filler.h file (the exact distribution to be replaced later on).

Reviewed By: highker

Differential Revision: D9417594

fbshipit-source-id: 5d66dfbcb254a5961c36b7d3a081332c7372dac7
2018-09-04 13:40:53 -07:00
1eed7d5f0b Report an error when trying to record a mutable operator when (#11129)
Summary:
there are multiple views of the tensor live.

Also adds recording for copy_ because this is the critical in place
op where these views will cause LHS indexing to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11129

Differential Revision: D9600195

Pulled By: zdevito

fbshipit-source-id: bfd8f5befa47377e36d704dbdb11023c608fe9a3
2018-09-04 13:40:51 -07:00
0e8088d6f6 Fix typo in data_parallel_model
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11086

Differential Revision: D9581297

fbshipit-source-id: b164177bdbb309f56ff3231c1ffc0973f6c5299b
2018-09-04 13:15:31 -07:00
ec6f0ed560 Additional Python Bindings
Summary:
Major change:
- Addition of pattern matching bindings

Minor change:
- OperatorDef instantiation
- Generic Graph API

Reviewed By: duc0

Differential Revision: D9546205

fbshipit-source-id: ab5274014be23a3e9e3fcf18ae1815c4f387b83c
2018-09-04 12:10:10 -07:00
750cd48980 update expect file for short circuiting (#11229)
Summary:
Fix failing test by updating expect file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11229

Differential Revision: D9638587

Pulled By: eellison

fbshipit-source-id: e870ef3a4fbc7e07f299cc9413703d9f77e89895
2018-09-04 11:56:09 -07:00
684b55d762 In default, use third party eigen. Added new flag USE_SYSTEM_EIGEN_INSTALL to control. (#11020)
Summary:
TSIA. apaszke pointed out that it might be better to use third party folder in default, since system Eigen may often be out of date and does not have the version we need to compile successfully.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11020

Differential Revision: D9562548

Pulled By: Yangqing

fbshipit-source-id: d8ab8a6ebe1f3d9eec638ef726cf5dc4dcf777b5
2018-09-04 10:56:22 -07:00
539579aa9a Logical short circuit (#11116)
Summary:
Adding short circuit evaluation to AND or OR. The second expression of and AND or OR gets lifted into an if branch, which is conditionally evaluated.

BatchOps was using the expression `dims = dims1 or dims2`, where dims is often an empty tensor. This nows throws an error, because dims1 gets cast to a boolean, and you can't convert an empty tensor to a scalar. It now matches the behavior of pytorch in python.

One thing that came up is if the second expression in an and/or in python gets returned, it does not get coerced to a boolean.

`tensor == (False or tensor)`
`tensor == (True and tensor)`

We do not currently support this.

edit: wording
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11116

Differential Revision: D9618168

Pulled By: eellison

fbshipit-source-id: 93b202be2f222d41f85d38d9c95f04d1749e8343
2018-09-04 09:25:13 -07:00
b2217109ec Move TensorOptions to ATen/core
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11147

Reviewed By: gchanan

Differential Revision: D9614321

fbshipit-source-id: 618cb342eb7c52181425f6bb9c17b9ecdb87a394
2018-09-04 08:55:54 -07:00
0ff1bb0d8a Remove Type constructor from TensorOptions, add Type::options (#11189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11189

Replaces it with an operator TensorOptions() method on
Type, reestablishing the implicit conversion.  I originally
wanted to get rid of the implicit conversion entirely, but
there were a *lot* of use-sites, so I added it back to avoid
a huge codemod.  In this patch, I only had to fix sites that
used the optional device_index API.

Reviewed By: cpuhrsch

Differential Revision: D9628281

fbshipit-source-id: 5fe2a68eefb77a3c9bb446f03a94ad723ef90210
2018-09-04 08:10:04 -07:00
0d5e4a2c66 Allow passing through arguments to unittest (#11209)
Summary:
Example:
```sh
python run_test.py -i sparse -- TestSparse.test_factory_size_check -f
```

With this, the `--verbose` option is redundant (one can call `python run_test.py -- -v` instead of `python run_test.py -v`. But since this is (probably) a frequently used flag, I didn't remove the existing easier-to-use option.

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11209

Differential Revision: D9632215

Pulled By: SsnL

fbshipit-source-id: ff522802da11ef0a0714578be46e4a44f6343d44
2018-09-03 20:09:08 -07:00
050aa42e09 Fix some more compile warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11208

Differential Revision: D9632216

Pulled By: SsnL

fbshipit-source-id: b181f3ce114474e171146cd2ac5de150b0e23f75
2018-09-03 19:39:33 -07:00
cd4c32691d Add complex32, complex64 and complex128 dtypes (#11173)
Summary:
We don't generate a corresponding Type implementations for them,
so this doesn't do anything at the moment.

We don't plan on supporting complex32 in the near future, but
it is added to reserve the name and number in case we do at
some point in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11173

Reviewed By: SsnL

Differential Revision: D9627477

Pulled By: ezyang

fbshipit-source-id: f49a44ab1c92d8a33130c249ac7b234f210a65e6
2018-09-03 19:19:36 -07:00
c5b021cc88 State dict loading arguments were in the wrong order (#11200)
Summary:
In the state dict loading code, it would print the error message referring to the shape of the loaded parameters and the parameters in the initialised model with the formatting in the wrong order. Swapped them round to fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11200

Differential Revision: D9631160

Pulled By: SsnL

fbshipit-source-id: 03d9446303bd417fef67027b10d7a27de06486be
2018-09-03 15:42:30 -07:00
7e2136c2b5 remove allclose from test_doc skipped list
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11187

Differential Revision: D9628349

Pulled By: SsnL

fbshipit-source-id: 0ff94666542ca049a6d82091bd9fc79ec1699ac6
2018-09-03 09:39:56 -07:00
24eb5ad0c5 Fix unit tests on CI (#11191)
Summary:
Disables two of the  unit tests in test_cuda that got introduced after test_cuda was enabled that fail on ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11191

Differential Revision: D9628702

Pulled By: ezyang

fbshipit-source-id: 4c298c728f42bb43d39b57967aa3e44385980265
2018-09-02 21:54:47 -07:00
0a8c8c1dbe Rename real to scalar_t. (#11163)
Summary:
This is necessary to allow us to use the complex header
which defines real (and is very sad if real is macro'ed).

We should also fix accreal, ureal, Real and REAL, but
only 'real' is the real blocker.

```
codemod -d aten/src/TH --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THC --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THCUNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11163

Reviewed By: SsnL

Differential Revision: D9619906

Pulled By: ezyang

fbshipit-source-id: 922cb3a763c0bffecbd81200c1cefc6b8ea70942
2018-09-02 15:26:01 -07:00
43fd6b234d Make Type a (mostly) pure virtual class; TypeDefault for impls (#11013) (#11013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11013

Previously, the parent class Type also contained a large number
of implementations, for things like broadcasting and native
functions that didn't need dispatch.  We'd like to be able
to reference this interface from Tensor even when we don't
have any of these implementations are available.

To do this, we convert Type into a truly pure virtual interface,
and move all of the implementations to TypeDefault.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11181

Differential Revision: D9561478

Pulled By: ezyang

fbshipit-source-id: 13c49d80bc547551adf524b1cf1d691bfe311133
2018-09-02 15:25:59 -07:00
e1a17d5a42 Should not use CAFFE2_API when definition is already in header. (#11114)
Summary:
Remove or use CAFFE2_EXPORT.
Fix #11108
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11114

Differential Revision: D9628293

Pulled By: ezyang

fbshipit-source-id: dc3bb7dc5bc299e3b6cfd1cdd640f618c206fb5a
2018-09-02 14:39:38 -07:00
cf10efb8d4 Fixes unclear exception message for F.conv2d (#11053)
Summary:
Fixes #11033
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11053

Differential Revision: D9573606

Pulled By: soumith

fbshipit-source-id: 9729cbd6c8afcef0fd487bdd425b0d1f55189009
2018-09-02 13:39:34 -07:00
593d74061f Document torch.allclose (#11185)
Summary:
- Modify torch.autograd.gradcheck to use torch.allclose instead
- Expose doc strings

Closes #10355
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11185

Differential Revision: D9628016

Pulled By: soumith

fbshipit-source-id: 22a30622b9fe52e41b5b3540406137b59d8c5a75
2018-09-02 09:26:07 -07:00
33c7cc13ca improve docker packages, fix bugs, enable tests, enable FFT (#10893)
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893

Differential Revision: D9615053

Pulled By: ezyang

fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
2018-09-02 08:54:42 -07:00
abe8b3391d LowRankMultivariateNormal cleanup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11179

Differential Revision: D9627502

Pulled By: soumith

fbshipit-source-id: c7a4aa8be24bd8c688a7c655ff25ca901ed19704
2018-09-02 07:54:56 -07:00
4d28b65fb8 fix serialization of nn.Parameter with dill (#10296)
Summary:
Should resolve #9981.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10296

Differential Revision: D9196353

Pulled By: soumith

fbshipit-source-id: 109b6da42b7240cdbc7a0586745c735bce5e1279
2018-09-01 23:55:40 -07:00
1350f76b62 Fix max and min with inf on CUDA (#11091)
Summary:
Fixes #10237 #11084

cc vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11091

Differential Revision: D9582859

Pulled By: SsnL

fbshipit-source-id: 3991c0a2af65ba82fa815b82f9e6b2107912fd10
2018-09-01 23:09:23 -07:00
7eba9849c1 Pool constants during script compilation. (#10231)
Summary:
This places all constants in the entry block of the graph, and de-duplicates them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10231

Differential Revision: D9601501

Pulled By: resistor

fbshipit-source-id: daa10ed8c99e9894830d6f3e5d65c8d3ab5ea899
2018-09-01 22:40:50 -07:00
7af6f9515f Move TensorAccessor to ATen/core
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11014

Reviewed By: cpuhrsch

Differential Revision: D9561802

fbshipit-source-id: d3dbe6d7e76e2419ead81fb448711f101daee19f
2018-09-01 21:41:26 -07:00
011f615945 Fix compile warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11177

Reviewed By: soumith

Differential Revision: D9626443

Pulled By: SsnL

fbshipit-source-id: e75d893e1e91e49d3e7b021892434489d8df7987
2018-09-01 21:41:25 -07:00
1506547771 Disable -Werror on macOS test build (#11090)
Summary:
cc goldsborough
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11090

Reviewed By: soumith

Differential Revision: D9582525

Pulled By: apaszke

fbshipit-source-id: 5d2c6e930e7b09f0ed5a35fbf4fe36b8845a2580
2018-09-01 21:09:49 -07:00
f60a2b682e allow spaces in filename for jit-compiled cpp_extensions (#11146)
Summary:
Now, folder having spaces will not error out for `torch.utils.cpp_extensionload(name="xxx", sources=["xxx.cpp"], verbose=True)` calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11146

Differential Revision: D9618838

Pulled By: soumith

fbshipit-source-id: 63fb49bfddc0998dccd8a33a6935543b1a6c2def
2018-09-01 20:39:51 -07:00
43e73f85ad Dont optimize slicing dispatch when we are tracing (#11156)
Summary:
Previously when we had a slicing expression like `x[0:5, 0]`, where the sliced tensor was of size `5` in dimension 0, we would skip dispatching the actual slice call as an optimization.

This caused incorrect behavior under tracing, as we would not record the slice op and thus if we encountered an input with a different shape while running the trace, we would get incorrect results.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11156

Differential Revision: D9622252

Pulled By: jamesr66a

fbshipit-source-id: 822f2e8f01504e131f53bd9ef51c171c7913a7cc
2018-09-01 17:13:03 -07:00
b3d559cdd1 Optimize WeightedSumOp for two inputs (#11049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11049

Optimize WeightedSumOp for two inputs

Reviewed By: houseroad

Differential Revision: D9566692

fbshipit-source-id: 9aab1f02251d386b6f7d0699ae11eeb2ea2b5b4f
2018-09-01 11:54:55 -07:00
b834d9107e Revert D9566744: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() (#11164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11164

Revert D9566744

Reviewed By: enosair

Differential Revision: D9620272

fbshipit-source-id: 6a78c46929f66bd11969840cb6b107f734be0c02
2018-08-31 22:25:57 -07:00
1b7172a2b9 fix the slice onnx exporting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11117

Reviewed By: MisterTea

Differential Revision: D9597870

Pulled By: houseroad

fbshipit-source-id: 3a2a307ee327397939bedb9150f780682e18a89a
2018-08-31 17:40:03 -07:00
03c06ec93d Traceable detach (#11038)
Summary:
This makes it so `detach` and `detach_` are traceable and also adds a pass to erase them before ONNX export
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11038

Differential Revision: D9588038

Pulled By: jamesr66a

fbshipit-source-id: 263dd3147e24fcb0c716743f37fdb9f84c0015e7
2018-08-31 16:40:42 -07:00
861e1c430c Move StorageImpl and Storage to core (#11154)
Summary:
Will need to be accessible by caffe2

This also removes a bunch of unnecessary includes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11154

Reviewed By: ezyang

Differential Revision: D9618681

Pulled By: cpuhrsch

fbshipit-source-id: 838a87b75d9c3959e145fd5fca13b63bc5de7bd3
2018-08-31 15:55:26 -07:00
4abddad1a0 use py::str to remove deprecation warnings (#11107)
Summary:
```
In file included from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/cast.h:13:0,
                 from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/attr.h:13,
                 from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pybind11.h:43,
                 from caffe2/torch/csrc/utils/pybind.h:6,
                 from caffe2/torch/csrc/jit/pybind.h:5,
                 from caffe2/torch/csrc/jit/script/init.h:3,
                 from caffe2/torch/csrc/jit/script/init.cpp:1:
third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pytypes.h:118:19: note: declared here
In file included from caffe2/torch/csrc/jit/pybind.h:12:0,
                 from caffe2/torch/csrc/jit/python_ir.cpp:4:
caffe2/torch/csrc/jit/pybind_utils.h: In function 'torch::jit::IValue torch::jit::argumentToIValue(const torch::jit::FunctionSchema&, size_t, pybind11::handle)':
caffe2/torch/csrc/jit/pybind_utils.h:138:226: warning: 'pybind11::str pybind11::detail::object_api<Derived>::str() const [with Derived = pybind11::detail::accessor<pybind11::detail::accessor_policies::str_attr>]' is deprecated: Use py::str(obj) instead [-Wdeprecated-declarations]
```

apaszke zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11107

Differential Revision: D9598040

Pulled By: goldsborough

fbshipit-source-id: 4a055353ac08d54a2bbca49573ff099310de3666
2018-08-31 15:25:04 -07:00
c48bf3a77e Automatic update of fbcode/onnx to 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c (#11153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11153

Previous import was bae6333e149a59a3faa9c4d9c44974373dcf5256

Included changes:
- **[1b09eb1](https://github.com/onnx/onnx/commit/1b09eb1)**: Fix the shape inference for concat (#1361) <Lu Fang>
- **[7b9b3ee](https://github.com/onnx/onnx/commit/7b9b3ee)**: ONNX v1.3.0 release (#1359) <bddppq>

Reviewed By: Ac2zoom

Differential Revision: D9615844

fbshipit-source-id: f1d4e2d6ef72a269d6ab3c1c347b272b5bdc4f2a
2018-08-31 14:55:15 -07:00
5987b44dda Remove aten doc/ folder (#11158)
Summary:
ATen's doc/ folder is manually maintained and can thus cause confusion with the generated file. We now have proper online documentation for ATen, which is superior to ATen doc/. Let's delete ATen/doc.

ezyang apaszke soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11158

Differential Revision: D9618782

Pulled By: goldsborough

fbshipit-source-id: 0ef14f84947601a0589aa4a41e5c8619783426fe
2018-08-31 14:55:13 -07:00
3081c8ea1d Lower trivial differentiable subgraphs (#11110)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11110

Differential Revision: D9616408

Pulled By: apaszke

fbshipit-source-id: f1ae77d698bf0ada32f2c1c3f587e46a4f57a867
2018-08-31 14:55:10 -07:00
c87d082d26 Use ->data<real>() instead of THTensor_(data) and c10::raw::intrusive_ptr::decref instead of _free (#11039)
Summary:
Codemod used for this

```
grep -rnw "THTensor_(free)" aten | grep -v Binary | cut -f 1 -d ":" | xargs -I {} sed -i "s/THTensor_(free)(\([^)]*\))/c10::raw::intrusive_ptr::decref(\1)/g" {}
```

```
grep -rnw "THTensor_(data)" aten | grep -v Binary | cut -f 1 -d ":" | xargs -I {} sed -i "s/THTensor_(data)(\([^)]*\))/\1->data<real>()/g" {}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11039

Reviewed By: ezyang

Differential Revision: D9617265

Pulled By: cpuhrsch

fbshipit-source-id: d9e7581867a335703f82f4556cead2b32b97bd83
2018-08-31 14:27:09 -07:00
adeebed549 Delete TensorImpl::toString() (#11035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11035

Instead, inline its definition into Tensor.  We need
to do this so we can avoid needing to getType() from
TensorImpl.

Reviewed By: cpuhrsch

Differential Revision: D9564516

fbshipit-source-id: 19fdaa2b93419e21572b9916714aee4165cb3390
2018-08-31 14:27:08 -07:00
5286925d4a Add getMaybeVariableType(const TensorImpl*) (#11031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11031

The eventual plan is to get rid of TensorImpl::type()
entirely; but first we need a function to call.

Reviewed By: cpuhrsch

Differential Revision: D9564206

fbshipit-source-id: b59a9ccfaed44199f185eff392835cec89ccda8e
2018-08-31 14:27:06 -07:00
2c5ae8c4bf Get rid of type() method on TensorOptions; use at::getType instead (#11023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11023

I'd like TensorOptions to not know anything about Context, so I can
move it to ATen/core without pulling in Context.  To do this, the
type() method has to go, since it consults the context to get a Type.

Reviewed By: cpuhrsch

Differential Revision: D9562467

fbshipit-source-id: 61a18a76eb042a5e70b64b963501e9d68c25d4f0
2018-08-31 14:27:05 -07:00
fd110411b7 Don't convert TensorOptions to type before printing.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11145

Reviewed By: cpuhrsch

Differential Revision: D9613897

fbshipit-source-id: eaa28b24992e8202cecb5ab97fa541fcf49a205f
2018-08-31 14:27:03 -07:00
48c2f3cf0f Move TensorOptions Tensor methods to TensorMethods.h (#11144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11144

We can move them now that TensorMethods no longer references Tensor.

Reviewed By: cpuhrsch

Differential Revision: D9613800

fbshipit-source-id: 99ad1dd7d77eb319000769230b7016294cf1980f
2018-08-31 14:27:02 -07:00
780d2792c5 Warn about non-traceable behavior when tracing (#11088)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11088

Differential Revision: D9585527

Pulled By: apaszke

fbshipit-source-id: 29a03cb152d83b626f748fff4501ac9e139994c2
2018-08-31 14:27:00 -07:00
c31ebccd01 Clean up TupleType and SchemaParser (#11007)
Summary:
Some fixes to address your comments zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11007

Differential Revision: D9597750

Pulled By: goldsborough

fbshipit-source-id: f35f4801707dff2367e9dfc7d4e968357bc2b832
2018-08-31 14:26:59 -07:00
f4b2961af9 Simplify assignment operators (#11027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11027

Using swap() as a primitive, copy and move assignment become much easier.

Reviewed By: ezyang

Differential Revision: D9563753

fbshipit-source-id: e74faf39b596f097de758bfe038639565807040a
2018-08-31 13:43:41 -07:00
6508db7421 Remove BUILD_CAFFE2 and build everything (#8338)
Summary:
This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification.

cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338

Reviewed By: mingzhe09088

Differential Revision: D9600513

Pulled By: orionr

fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d
2018-08-31 13:10:24 -07:00
a2a584f347 Proper recompilation tracking for more files in tools/autograd (#11143)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11143

Differential Revision: D9613758

Pulled By: ezyang

fbshipit-source-id: 08ed143739438435e0e8219dff3a738ab424c3e1
2018-08-31 13:10:21 -07:00
3791bd12c8 PT1 Release Milestone No.2 MPI Group Support with all tests passed (#11128)
Summary:
Added MPI group support.
And this will make all previous group test cases of MPI passed.

Also, release the MPI thread level support by serializing different PG's MPI ops. This is required.

The build is fixed too
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11128

Differential Revision: D9602188

Pulled By: teng-li

fbshipit-source-id: 1d618925ae5fb7b47259b23051cc181535aa7497
2018-08-31 12:39:56 -07:00
d95e68c8cc Delete Tensor constructor from TensorOptions. (#11101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11101

I'd like to invert the dependency between Tensor and TensorOptions
(such that Tensor includes TensorOptions); to do this, I'd prefer
there to not be a Tensor constructor.  Eventually, all references
of Tensor will disappear from TensorOptions.h

Reviewed By: cpuhrsch

Differential Revision: D9585627

fbshipit-source-id: dd4a28b2c06b1e55f629762915f03c2b6c34d840
2018-08-31 09:55:01 -07:00
a585158c9e Some usage examples for TensorOptions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11081

Reviewed By: goldsborough

Differential Revision: D9579371

fbshipit-source-id: 329a07fc2e58f57384c8a840bcdebc2c6d4f7bb1
2018-08-31 09:40:30 -07:00
e2bdd35cf0 fixes to device.cc (#11122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11122

these changes add fixes to device.cc that are appropriate to create the intra-device-copies for opencl

Reviewed By: bwasti

Differential Revision: D9553292

fbshipit-source-id: e59f17916b5df30a504adee0718f9cecfe28f35a
2018-08-31 09:25:26 -07:00
f30fd7fb5c Get rid of the runtime type in TensorOptions (#11021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11021

We can now store a boolean saying if we want a Variable or not,
and context can use VariableHooks to get a VariableType if we
request one.

Reviewed By: cpuhrsch

Differential Revision: D9562312

fbshipit-source-id: 84653cd789622764132252406a5ea1a83eee3360
2018-08-31 09:10:52 -07:00
1db5a7d8f0 Move variable getType lookup support to Context
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11017

Reviewed By: cpuhrsch

Differential Revision: D9562197

fbshipit-source-id: dd00c79592d6c59f2e21c9d62fea3a2c093b609b
2018-08-31 09:10:51 -07:00
9fac0a5093 Rename at::getType to at::getNonVariableType (#11096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11096

To discourage willy-nilly use, and make it clearer that it
is not a Variable

Reviewed By: cpuhrsch

Differential Revision: D9583699

fbshipit-source-id: 4fbde0c01ae3deb2c7ef8c125a9028f089b203ae
2018-08-31 09:10:49 -07:00
0961c923c0 Unbreak the build
Summary: The controller you requested could not be found.

fbshipit-source-id: 861021dbe88f84d1a8bd80e04dd684527384629f
2018-08-31 08:13:12 -07:00
3073051a18 Revert D9554375: Support lr adaption for SparseAdam and RowWiseSparseAdam
Differential Revision:
D9554375

Original commit changeset: b88768f470ef

fbshipit-source-id: 2c103c616c8680684892c7d9085fd7bb8289d2f1
2018-08-31 07:54:31 -07:00
82aeebb3d9 Fix a bug in addmm fusion in the JIT (#11100)
Summary:
Fixes #10839.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11100

Differential Revision: D9585533

Pulled By: apaszke

fbshipit-source-id: 19e2710c8fc113f577faf14c080d8c89afbe23c4
2018-08-31 07:24:34 -07:00
0555768e0f Support lr adaption for SparseAdam and RowWiseSparseAdam (#10993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10993

as title

Reviewed By: chocjy

Differential Revision: D9554375

fbshipit-source-id: b88768f470ef7d023dd481c6a97b91594892f422
2018-08-31 00:55:39 -07:00
f1bfe6750f Back out "[caffe2] Update blackbox predictor with new constructor" (#11105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11105

Reverts: D9516972

See this discussion for context: https://fburl.com/w45hb1oc

Reviewed By: highker

Differential Revision: D9587931

fbshipit-source-id: 715247929d819dfa88e1d051021e51c5bf0c4835
2018-08-31 00:55:36 -07:00
9fae8fcdff framework for committed serialized tests (#10594)
Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.

To use:
1. Refactor your test to be a SerializedTestCase
1a. Decorate it with given_and_seeded
1b. Call testWithArgs in main
2. Run your test with -g to generate the output. Check it in.
3. Subsequent runs of the test without generating the output will check against the checked in test case.

Details:
Run your test with `python caffe2/python/operator_test/[your_test].py -g`
Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?)

Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594

Reviewed By: ezyang

Differential Revision: D9370359

Pulled By: ajyu

fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8
2018-08-30 22:41:46 -07:00
00df09b65d Change specialization rules in GraphExecutors (#10977)
Summary:
**Review last commit only.** Stacked on top of #10949.

This commit fixes a number of issues connected to caching
differentiability status of graphs inside graph executors,
and changes the rules for optimization of differentiable subgraphs.
Previously every one of those was instantiated as a separate graph
executor, but now they are simply heavier-optimized graph regions,
and graph executors are only instantiated for their backward.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977

Differential Revision: D9600626

Pulled By: apaszke

fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3
2018-08-30 22:11:01 -07:00
a320e5cbd3 Move static_context outside of class (#11097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11097

att

Reviewed By: ezyang

Differential Revision: D9549702

fbshipit-source-id: 058b942311b00be20a0b557ba97eb3451ea55e33
2018-08-30 22:10:58 -07:00
750ede7215 Rename getType to getVariableTypeFromBaseType / getVariableType (#11095)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11095

We used getType to mean a lot of things.

- getVariableTypeFromBaseType: given a base Type (non-Variable type)
  compute the Variable Type which corresponds to it.

- getVariableType: like at::getType, but return the Variable type
  rather than the plain type.

This rename makes it clearer at the use-site what things are what,
and will make a subsequent rename of at::getType easier.

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9583630

fbshipit-source-id: 2667ec98e7607bc466920c7415a8c651fd56dfca
2018-08-30 20:11:25 -07:00
c836a04dc8 Delete a bunch of uses of getType in favor of TensorOptions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11087

Reviewed By: cpuhrsch

Differential Revision: D9581560

fbshipit-source-id: ebe3c4c0956da8a7215ada287bf6526dbcb2b07d
2018-08-30 20:11:24 -07:00
34a0604d51 Eliminate use of getType from DLConvertor (#11080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11080

- Add a new TensorOptions(Device, ScalarType) constructor,
  which serves roughly the same role as getType used to.
  We shouldn't get too wild with these constructors, but
  since this particular one was widely used by getType,
  it seems worth adding.
- Change DLPack DeviceType conversion to at::DeviceType,
  rather than at::Backend.  While I'm at, add a few more
  conversions that at::DeviceType understands.
- Add a new overload of from_blob which understands strides.

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9578734

fbshipit-source-id: 28288ec053aae8765e23925ab91023398d632d6b
2018-08-30 20:11:23 -07:00
c283acce72 Rename getTypeRaw to getNonVariableTypeRaw (#11078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11078

```
codemod -d . --extensions cc,cpp,cu,cuh,h getTypeRaw getNonVariableTypeRaw
```

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9578399

fbshipit-source-id: 00a86ae8fb00d14116762ce39d15858da9a1671e
2018-08-30 20:11:21 -07:00
66c4d7e060 Rename getTypeOpt to getNonVariableTypeOpt (#11077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11077

getType now supports retrieving variable types, so make it clearer
when a getType function does NOT give you a variable type.

```
codemod -d . --extensions cc,cpp,cu,cuh,h getTypeOpt getNonVariableTypeOpt
```

Reviewed By: gchanan

Differential Revision: D9578398

fbshipit-source-id: 3ee502ac5c714849917f11ddc71de8eacfdaa9d3
2018-08-30 20:11:20 -07:00
f3c3127c67 Don't flatten output lists in the JIT IR (#10949)
Summary:
Operators like aten::chunk used to return a number of tensors, but
now return a list. To make it easier to do shape prop through
aten::chunk and fuse it, I've also introduced prim::ConstantChunk,
which behaves like the previous implementation (has a variable length
output list).

The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949

Reviewed By: zdevito

Differential Revision: D9556823

Pulled By: apaszke

fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee
2018-08-30 19:54:39 -07:00
c8c21fa2b4 Allow same flags when glog is used or not (#11034)
Summary:
Extracted from https://github.com/pytorch/pytorch/pull/8338

cc mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11034

Reviewed By: mingzhe09088

Differential Revision: D9582801

Pulled By: orionr

fbshipit-source-id: b41ca1bebf6cf62fff2a2b8caf4c94af3e43db00
2018-08-30 19:24:51 -07:00
26409a4300 Caffe2 flags needs to be used after the GlobalInit function is called
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11120

Reviewed By: llyfacebook

Differential Revision: D9598430

Pulled By: sf-wind

fbshipit-source-id: 468f0ed7880339c9c4467d1cef29f5bc9fc80a2a
2018-08-30 19:10:39 -07:00
a6cb41486d update documentation for observers
Summary:
update to the latest observer usage syntax
add an example of HistogramObservers

Reviewed By: jspark1105

Differential Revision: D6878439

fbshipit-source-id: c9521f2daecfc7f0c17de6a944dce58e568e3dbe
2018-08-30 18:11:48 -07:00
15314c7b8e GCC-7 doesn't like the original syntax. (#10665)
Summary:
Replace with "this->template f<T>()".

Fix #7881
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10665

Differential Revision: D9597187

Pulled By: ezyang

fbshipit-source-id: 8af4e7efd98edadabb97e2523a58bd21bc116d1a
2018-08-30 16:41:16 -07:00
684bd1b7bd size_ -> numel_ (#11112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11112

att

Reviewed By: ezyang

Differential Revision: D9474018

fbshipit-source-id: d9267e52e2d50dac7524a456a44f2e28b6c0b693
2018-08-30 16:41:13 -07:00
7ddc6f84c4 NULL -> nullptr (#11047)
Summary:
How did we get so many uses of `NULL` again?

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047

Differential Revision: D9566799

Pulled By: goldsborough

fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3
2018-08-30 16:25:42 -07:00
302e9cb815 Update onnx submodule to onnx/onnx@bae6333 (#10961)
Summary:
ONNX v1.3.0 release

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10961

Reviewed By: houseroad

Differential Revision: D9543998

Pulled By: bddppq

fbshipit-source-id: b7f0a0553d832d609d3b7613a608f7bf4a2582ef
2018-08-30 15:25:57 -07:00
56c737a9b7 Inject GetEmptyStringAlreadyInited once for static proto (#11045)
Summary:
I've been seeing a lot of warnings about multiple declarations of this. Hopefully this fixes it.

cc Yangqing mingzhe09088 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11045

Reviewed By: mingzhe09088

Differential Revision: D9582756

Pulled By: orionr

fbshipit-source-id: 6171485609a2f2f357d6e1c44e26b4ecfcdb4ce6
2018-08-30 14:59:54 -07:00
a136d29fd1 Use intrusive_ptr in Storage (#10907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10907

replace shared_ptr with intrusive_ptr in Storage

Reviewed By: ezyang

Differential Revision: D9414388

fbshipit-source-id: d413549ffde24959166d2dff2042b99f0c5018af
2018-08-30 14:59:52 -07:00
f0142faab0 Expose arbitrary cpp autograd functions to Python (#11082)
Summary:
This is needed because the JIT declares some custom autograd functions.

colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11082

Differential Revision: D9580456

Pulled By: apaszke

fbshipit-source-id: 6bf00c1188a20b2ee6ecf60e5a0099f8263ad55a
2018-08-30 14:25:59 -07:00
93bd291e55 Change torch.jit.trace to no longer be a decorator (#11069)
Summary:
This was done because it surprising for a decorator to run a function
rather than wrap it, and not simplify the syntax for tracing modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11069

Reviewed By: jamesr66a

Differential Revision: D9583192

Pulled By: zdevito

fbshipit-source-id: b914b7ab4c73c255086465a6576eef3a22de1e13
2018-08-30 13:56:05 -07:00
ebe9d204fa Add test cases to intrusive_ptr (#11026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11026

ezyang fixed a bug with moving or copying an intrusive_ptr into itself.
This diff adds test cases for it.

Reviewed By: ezyang

Differential Revision: D9563464

fbshipit-source-id: 3a3b3f681124730d2500b276c0135c3bba7875ae
2018-08-30 13:25:33 -07:00
e85f3fccb3 Fix relying on UB in test_data_parallel_nested_output (#11092)
Summary:
We shouldn't reply on plain `dict` ordering. Example failure: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-xenial-cuda8-cudnn6-py3-test1/8417/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11092

Reviewed By: ezyang

Differential Revision: D9583274

Pulled By: SsnL

fbshipit-source-id: ba80b96648c98c24c2ec5fa6fd9aa566c095cce7
2018-08-30 13:10:25 -07:00
9d4360c060 Creates stream pool (#9938)
Summary:
This PR creates a stream pool per issue #9646. When a new stream is requested, that device it's requested on lazily creates two pools, one low priority and one high priority, of 32 streams each. Streams are returned from these pools round-robin. That is, stream 0 is returned, then stream 1... then stream 31, then stream 0... This PR also takes the opportunity to clean up the stream API, reducing its complexity and verbosity.

Change notes:

- There are now 3 sets of streams per device, the default stream, the low priority streams, and the high priority streams. These streams live in lazily initialized pools and are destroyed on shutdown.
- All stream refcounting has been removed (the pools pattern replaces it).
- Setting a stream now sets it on its device. Streams are associated with a device and the previous
requirement to specify that device was unnecessary.
- There is no exposure for setting the flags on a stream. This may also seem like a regression but the flag was always set to cudaStreamNonBlocking.
- Streams are now low or high priority whereas previously the priority could be set with an integer. In practice, however, the range for priorities is -1 to 0 on the latest hardware. -1 is high priority, 0 is low priority (aka default priority). Low vs. high actually clarifies this behavior if people were trying finer separations. (E.g., if someone tried streams with priorities 0, 1, and 2, they would actually all have priority 0, historically, and the intended behavior would not be respected.)
- Unused THCStream and THCState stream-related functions were removed.
- A new test of pooling behavior was added in stream_test.

fyi: colesbury, apaszke, goldsborough
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9938

Reviewed By: SsnL

Differential Revision: D9569036

Pulled By: ezyang

fbshipit-source-id: 12ed673fe373170d0cf4d65cb570de016c53ee7d
2018-08-30 12:40:23 -07:00
23b0c90e71 caffe2: fix gcc8 warnings
Summary:
The warnings are erroneous as far as i can see,
so tweak things to avoid. The (unsigned int) cast is
to avoid passing -1 to a size_t time.  This was triggered
in gcc8's lto build only, giving:

  caffe2/aten/src/TH/generic/THTensor.cpp: In function ‘THFloatTensor_squeeze1d’:
  lto1: error: ‘__builtin_memset’ specified size 18446744073709551608
  exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]
  In function ‘newImpl’,
    inlined from ‘operator new’ at common/memory/OperatorOverride.cpp:86:23,
    inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/ext/new_allocator.h:111:0,
    inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/alloc_traits.h:436:0,
    inlined from ‘_M_allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:172:0,
    inlined from ‘_M_default_append’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/vector.tcc:571:0,
    inlined from ‘resize’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:671:0,
    inlined from ‘THTensor_resizeDim’ at caffe2/aten/src/TH/THTensor.hpp:123:0,
    inlined from ‘THFloatTensor_squeeze1d.part.198’ at caffe2/aten/src/TH/generic/THTensor.cpp:429:0,
    inlined from ‘THFloatTensor_squeeze1d’:
  common/memory/OperatorOverride.cpp:86:23: error:
  argument 1 value ‘18446744073709551608’ exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=]
   void* ptr = malloc(size);

Reviewed By: soumith

Differential Revision: D9568621

fbshipit-source-id: 4569a4be897d669caa3f283f4b84ec829e8d77ad
2018-08-30 11:55:29 -07:00
611a608517 Add ATen pdist CPU kernel (#10782)
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782

Reviewed By: ezyang

Differential Revision: D9583378

Pulled By: erikbrinkman

fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
2018-08-30 11:55:27 -07:00
029082e87c Add entry for torch/lib/pythonX.Y in .gitignore (#11083)
Summary:
I've had `torch/lib/python3.6` show up as part of the build for some time now. It's not ignored which means I need to be extra careful about checking in files, or I end up with a thousand of them in my index.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11083

Differential Revision: D9580453

Pulled By: apaszke

fbshipit-source-id: 369e4fe87962696532d111b24f2a4a99b9572bf2
2018-08-30 11:40:25 -07:00
40227671e9 Add strides to caffe2::Tensor (#10826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10826

Add strides, and make sure the strides are consistent with sizes, and is_contiguous, for all the Caffe2 functions.

is_contiguous means strides_[dim-1] = 1 and strides_[i] = strides_[i+1] * max(size_[i+1], 1);

Reviewed By: ezyang

Differential Revision: D9354480

fbshipit-source-id: 3643871b70f1111b7ffdd9fdd9fe9bec82635963
2018-08-30 11:25:58 -07:00
535633bddc Export MPI functions (#11037)
Summary:
Potential fix for https://github.com/caffe2/caffe2/issues/2551#issuecomment-417124872

cc Yangqing mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11037

Reviewed By: mingzhe09088

Differential Revision: D9580937

Pulled By: orionr

fbshipit-source-id: 5e1fbf718728271a5b5af526d8e67cc5b48f0575
2018-08-30 10:42:02 -07:00
e7195431e0 Add benchmarking functionality to the benchmark app (#10976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10976

The app can run in XCode with the benchmark metrics collected.
It can also run when building with buck

Reviewed By: llyfacebook

Differential Revision: D9546755

fbshipit-source-id: 60ad0112946f8cf57138417f6838a58ed6d2c90f
2018-08-30 09:54:55 -07:00
a8af7fe46a Support import of nn.RNNCellBase in __all__
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10992

Differential Revision: D9572005

Pulled By: soumith

fbshipit-source-id: 26b546830b6a25a4f7ba6f825cd888d678233a97
2018-08-30 08:25:21 -07:00
dbc0004f99 Remove use_count() == 1 in Tensor::Extend (#11046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11046

As suggested by jerryzh168, temporary fix for a new constraint that was added D9350686 is to remove this assert. Long term jerryzh168 is going to work out a better way of handling this.

Reviewed By: jerryzh168

Differential Revision: D9566323

fbshipit-source-id: e4630c7cbe0cc68a084974ea7048654811fae01f
2018-08-29 23:55:28 -07:00
23af7deea7 Add has_lapack flag (#11024)
Summary:
Currently our `skipIfLapack` has uses a try-catch block and regex match the error message. It is highly unreliable. This PR adds `hasLAPACK` and `hasMAGMA` on ATen context, and expose the flags to python.

Also fixes refcounting bug with `PyModule_AddObject`. The method steals reference, but we didn't `Py_INCREF` in some places before calling it with `Py_True` or `Py_False`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11024

Differential Revision: D9564898

Pulled By: SsnL

fbshipit-source-id: f46862ec3558d7e0058ef48991cd9c720cb317e2
2018-08-29 22:41:16 -07:00
ad1670cf54 Kill the dummy TaskOutput when task.get_step() (#11048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11048

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739

I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint.

But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk

This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan".

This adding a dummy TaskOutput when user specifies no TaskOutput is a hack.
The reason for this is that ZMQ socket can't send empty blob list.
As a result, if the Task on the Worker had no output,
The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`.

TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces.

Instead, we should move the creating of the dummy blob to some deeper layer,
and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces.
After this change, the workaround becomes totally transparent and no side-effect to users.

Reviewed By: mraway

Differential Revision: D9566744

fbshipit-source-id: 18292dd64a6d48192c34034200a7c9811d2172af
2018-08-29 20:11:29 -07:00
16b8e0a787 at::StorageImpl: Rename size_ to numel_ and elementSize() to itemsize()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11011

Reviewed By: ezyang

Differential Revision: D9561898

Pulled By: cpuhrsch

fbshipit-source-id: 0cf5cdc3e7acd397f7e2d66097856aaad0581147
2018-08-29 20:11:27 -07:00
394bdcd49a Fix the build of aten tests when FULL_CAFFE2=1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11019

Reviewed By: orionr

Differential Revision: D9562691

Pulled By: houseroad

fbshipit-source-id: 95a8dee580e5f4dc9af3a2e1f68ec6c62a0e4e04
2018-08-29 18:09:54 -07:00
e550eab3e2 Remove MetaNetDef test case in Predictor (#11052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11052

Delete the test case for Predictor with constructing by MetaNetDef since the constructor
actually has been deprecated. The broken PR is for construcing predictor from DB instance.

Reviewed By: highker

Differential Revision: D9566935

fbshipit-source-id: 5511883953a2d3f6eb0a4f1c5518a1bc4b3ffbdc
2018-08-29 17:55:21 -07:00
91ecbf8b1d Remove TensorBase (#11036)
Summary:
Not subclassed except by Tensor. Also requried to align further with
caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11036

Reviewed By: ezyang

Differential Revision: D9565640

Pulled By: cpuhrsch

fbshipit-source-id: ff7203a2c95d3f3956282b4f2d8dda6c2b93f4a6
2018-08-29 17:27:19 -07:00
ae635b16f7 Record tensor factory functions in trace (#10935)
Summary:
Things like torch.zeros now appear in traces rather than constants.

To continue to support our current level of ONNX export, we run
constant prop to turn these back into constants where possible before
export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10935

Differential Revision: D9527427

Pulled By: zdevito

fbshipit-source-id: 552a8bcc01b911251dab7d7026faafdd7a3c758a
2018-08-29 17:10:24 -07:00
c4e1adf29d Remove THHalf type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11010

Reviewed By: ezyang

Differential Revision: D9561325

Pulled By: li-roy

fbshipit-source-id: 053cf2925ec1fc458db31e92bd31ffd23389f3e8
2018-08-29 16:44:45 -07:00
2cc98d8df7 Adds dim argument to torch.unique (#10423)
Summary:
Initial version of `unique` supporting a `dim` argument.

As discussed in [this issue](https://github.com/pytorch/pytorch/issues/9997) I added the `dim` argument to `torch.unique` with the same behavior like [numpy](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.unique.html).

Since the implementation is based on `std/thrust::unique`, the `tensor` always needs to be sorted. The `sorted` argument in `torch.unique` does not have any function, as in the CUDA version of the plain `torch.unique`.

To check the performance and equal behavior between `torch.unique` and `np.unique`, I've used [this gist](https://gist.github.com/ptrblck/ac0dc862f4e1766f0e1036c252cdb105).

Currently we achieve the following timings for an input of `x = torch.randint(2, (1000, 1000))`:
(The values are calculated by taking the average of the times for both dimension)

| Device | PyTorch (return_inverse=False) | Numpy (return_inverse=False) | PyTorch (return_inverse=True) | Numpy (return_inverse=True) |
| --- | --- | --- | --- | --- |
| CPU | ~0.007331s | ~0.022452s | ~0.011139s | ~0.044800s |
| GPU | ~0.006154s | - | ~0.105373s | - |

Many thanks to colesbury for the awesome mentoring and the valuable advices on the general implementation and performance issues!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10423

Differential Revision: D9517289

Pulled By: soumith

fbshipit-source-id: a4754f805223589c2847c98b8e4e39d8c3ddb7b5
2018-08-29 16:26:09 -07:00
98d85b1790 Debugging help + test
Summary: When conversion fails, dump more information to help fix up the netdef

Reviewed By: hyuen, yinghai

Differential Revision: D9558667

fbshipit-source-id: 8917cc61c9be6285697e4f8395a9dbc7135f618e
2018-08-29 16:26:07 -07:00
ef7fc2a3e1 Remove at::StorageImpl::finalizer_ (#11022)
Summary:
Unused member variable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11022

Reviewed By: ezyang

Differential Revision: D9562520

Pulled By: cpuhrsch

fbshipit-source-id: af190b3ba06d33d65fa0fabffb34a0df769f38d0
2018-08-29 16:09:47 -07:00
6b87198245 Devirtualize StorageImpl deconstructor (#11018)
Summary:
Further align at::StorageImpl with caffe2::StorageImpl
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11018

Reviewed By: ezyang

Differential Revision: D9562256

Pulled By: cpuhrsch

fbshipit-source-id: d929317f6226a1e2550b78034b723afbae343aaa
2018-08-29 15:39:54 -07:00
d9b74f6540 Make it possible to disable JIT using env variables (#10867)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10867

Differential Revision: D9556882

Pulled By: apaszke

fbshipit-source-id: 04c0ca875d15d37dd9ac05ac7b515cd899ddb7e4
2018-08-29 15:11:05 -07:00
c755616e00 Enable Detectron model inference for CPU and MKL-DNN paths (#10157)
Summary:
1. Support ops needed for inference of Faster-RCNN/Mask-RCNN needed in Detectron, mostly direct fallbacks.
2. Use CPU device to hold 0-dim tensors and integer tensors in both fallback op and blob feeder, needed by Detectron models.
3. Ignore 0-dim tensor in MKL-DNN concat operator.
4. Generate dynamic library of Detectron module for CPU device.

This PR obsoletes #9164.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10157

Differential Revision: D9276837

Pulled By: yinghai

fbshipit-source-id: dc364932ae4a2e7fcefdee70b5fce3c0cee91b6f
2018-08-29 15:11:01 -07:00
89834dfe64 Add GPU version of HardSigmoid Op to Caffe2 (#10955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955

Add GPU version of HardSigmoid Op to Caffe2. Updated test file to
include GPU tests.

Reviewed By: enosair

Differential Revision: D9499353

fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545
2018-08-29 14:55:29 -07:00
22e3b2c9c3 Revert D9413150: [New Checkpoint] Kill the dummy TaskOutput when task.get_step()
Differential Revision:
D9413150

Original commit changeset: 51aaf3201e26

fbshipit-source-id: ac7c4c0960db03f344fe3eb2ad7f0e034db2371a
2018-08-29 14:39:49 -07:00
6a8bc3804a Add flush to logging messages higher than INFO. (#10983)
Summary:
This probably fixes the logging test error that orionr is encountering - haven't tested locally but wanted to send out a PR to kick off CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10983

Reviewed By: ezyang

Differential Revision: D9552607

Pulled By: Yangqing

fbshipit-source-id: 9ac019031ffd9c03972144df04a836e5dcdafe02
2018-08-29 14:39:48 -07:00
0b1de74732 Documentation improvement in caffe2/core/tensor.h (#11006)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11006

Reviewed By: smessmer

Differential Revision: D9558383

Pulled By: ezyang

fbshipit-source-id: 7d36fb69a6e8a7d064da2c8796dc263a9fd4e094
2018-08-29 14:25:38 -07:00
e9eed8edb4 Add doc for Tensor.digamma_? (#11008)
Summary:
follow up for #10967

zou3519 vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11008

Differential Revision: D9559889

Pulled By: SsnL

fbshipit-source-id: a05d8fbad92a54bcdb93de6e62a7f94180da1d99
2018-08-29 14:11:16 -07:00
f687ff5a59 Delete unnecessary includes from TensorImpl.h (#11005)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11005

Reviewed By: smessmer

Differential Revision: D9558300

Pulled By: ezyang

fbshipit-source-id: ebebb3c6d3a1a2f7cc3da9fe9d3c56310ead46e1
2018-08-29 14:11:14 -07:00
b644d5e74a Delete context and get_context from Type.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11001

Reviewed By: cpuhrsch

Differential Revision: D9557315

fbshipit-source-id: b9862b8dda49194298bb1a4fbc214d466f3c8350
2018-08-29 13:55:45 -07:00
cd9416317d Minor copy-edit on setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10933

Reviewed By: cpuhrsch

Differential Revision: D9526650

fbshipit-source-id: 8ad1c989bee7009b3f95a2641189f55cf6c1979f
2018-08-29 13:41:04 -07:00
c99a143eea Update blackbox predictor with new constructor (#10920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10920

Update the black box predictor and the related code to use the
constructor with PredictorConfig.

Reviewed By: highker

Differential Revision: D9516972

fbshipit-source-id: fbd7ece934d527e17dc6bcc740b4e67e778afa1d
2018-08-29 13:31:45 -07:00
56539f5fe1 PT1 Distributed Release MileStone No.1 - Completed Distributed Package and CI tests (#10871)
Summary:
The PR includes:
(1) torch.distributed.c10d, which now includes the complete backward compatible frontend API for `torch.distributed`
(2) `env://` init method functionality
(3) Minor change to `test_distributed.py`, which is now a test for `torch.distributed.c10d`.
(4) The old `test_distributed.py' is now moved to `test_distributed_thd`
(5) Miscellaneous bug fixes.
(6) DDP CPU test is removed since c10d doesn't have this support yet, but this is a very easy test after moving DDP CPU's dependency to torch.distributed.c10d.
(7) CI config to test MPI, NCCL, and Gloo backend of c10d

**Now all the distributed test including c10d DDP can pass with the c10d frontend API**

TODO: (in a separate PR)
MPI subgroup support, once this is added, CI group test will be enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10871

Differential Revision: D9554514

Pulled By: teng-li

fbshipit-source-id: fb686ad42258526c8b4372148e82969fac4f42dd
2018-08-29 12:55:57 -07:00
fa7c81c640 nomnigraph - nit - code style update (#10987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10987

some code style update to make it consistent with fb cpp style

Reviewed By: yinghai

Differential Revision: D9550130

fbshipit-source-id: 6aef9878676c08e7d384383c95e7ba8c5c9a1bce
2018-08-29 12:55:55 -07:00
ec519e8a4a Reduce number of elements within test_abs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10997

Differential Revision: D9556861

Pulled By: cpuhrsch

fbshipit-source-id: 986ef275e94fcffcc04a5c1103b8b7bfb4ae3ba5
2018-08-29 12:55:54 -07:00
dbce1c840f exposing net_transformer_fun before add grad (#11003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11003

Need a interface to re-write the graph after the net is built and after adding gradient ops.

Reviewed By: aazzolini, harouwu

Differential Revision: D9557827

fbshipit-source-id: 2e082f0321c0776e488a29e18047d950948e7c37
2018-08-29 12:55:52 -07:00
bed9d41abd Generate Type::registerCPU as we do register_cuda_types. (#10947)
Summary:
The goal here is to separate out the base Type into core; as it was done previously we need all derived Types to be defined when we compile the base Type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10947

Reviewed By: gchanan

Differential Revision: D9540025

Pulled By: ezyang

fbshipit-source-id: 49f0b5acb3c378348ef3a55780abb73e4ae27edd
2018-08-29 12:39:47 -07:00
4e446b85fb Make profiler.build_table() O(n) rather than O(n^2) (#10969)
Summary:
Fixes #10851

Speeds up profiling results dramatically.

For the following script:
```
import torch
import time

ITER = 2000

x = torch.randn(1, 1, requires_grad=True)

with torch.autograd.profiler.profile() as prof:
    y = x
    for i in range(ITER):
        y = 3 * y - 2 * y
    y.backward()

start = time.time()
print("Done running. Preparing prof")
x = str(prof)
print("Done preparing prof results")
end = time.time()
print("Elapsed: {}".format(end - start))
```

I get 7s before / 0.13s after these changes.

cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10969

Differential Revision: D9556129

Pulled By: zou3519

fbshipit-source-id: 26b421686f8a42cdaace6382567d403e6385dc12
2018-08-29 12:25:51 -07:00
396dec0e37 s/spaerse/sparse (#10968)
Summary:
cc SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10968

Differential Revision: D9546746

Pulled By: zou3519

fbshipit-source-id: a6a4bb8bb04eccf89c3d90a90259070beb484500
2018-08-29 12:13:04 -07:00
525548fb64 Move SparseTensorRef to core, change some includes to core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10964

Differential Revision: D9545021

Pulled By: gchanan

fbshipit-source-id: 8ba7e5e3a7bdf24e5aeb4bbc91957c1a6f14d7f0
2018-08-29 11:55:29 -07:00
e0dbb91060 Windows raw string fix (#10998)
Summary:
Breaking this out of https://github.com/pytorch/pytorch/pull/8338

mingzhe09088's fix of the docstrings for Windows builds. Unfortunately some versions of Windows seem to try and parse the `#` inside the string as a pre-processor declaration. We might need to change this to something else later, but want to get this landed first.

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10998

Reviewed By: mingzhe09088

Differential Revision: D9557480

Pulled By: orionr

fbshipit-source-id: c6a6237c27b7cf35c81133fd9faefead675a9f59
2018-08-29 11:40:08 -07:00
206d52d0e3 Disable smart_tensor_printer_test without glog (#10999)
Summary:
Breaking out of https://github.com/pytorch/pytorch/pull/8338

This test fails once we start building with `-DUSE_GLOG=OFF` since the non-glog logging case doesn't support flushing or streaming to the right location. For now, we just disable this test in that case.

cc Yangqing mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10999

Reviewed By: mingzhe09088

Differential Revision: D9557488

Pulled By: orionr

fbshipit-source-id: 8b306f210411dfc8ccc404bdccf77ddcd36a4830
2018-08-29 11:10:23 -07:00
562fc7631f Add test cases for ONNX unsqueeze (#10924)
Summary:
PyTorch exporting test and end to end cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10924

Reviewed By: Ac2zoom

Differential Revision: D9548210

Pulled By: houseroad

fbshipit-source-id: 2381d1ad92a4e07f97060eb65c9fd09f60ad3de6
2018-08-29 11:10:21 -07:00
1b0d5e60ab Get rid of some unnecessary includes of Context. (#10951)
Summary:
This is part of splitting Context from what needs to go in ATen/core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10951

Differential Revision: D9540369

Pulled By: gchanan

fbshipit-source-id: 73b0e8c4493785fbab368a989f46137c51f6ea0b
2018-08-29 11:10:20 -07:00
a9469c9c8a Fill eigenvector with zeros if not required (#10645)
Summary:
Fix #10345, which only happens in CUDA case.

* Instead of returning some random buffer, we fill it with zeros.

* update torch.symeig doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10645

Reviewed By: soumith

Differential Revision: D9395762

Pulled By: ailzhang

fbshipit-source-id: 0f3ed9bb6a919a9c1a4b8eb45188f65a68bfa9ba
2018-08-29 10:55:22 -07:00
b41988c71e Cleanup BUILD_DOCS cmake section (#11000)
Summary:
Breaking out of https://github.com/pytorch/pytorch/pull/8338

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11000

Differential Revision: D9557474

Pulled By: orionr

fbshipit-source-id: 7d84914b67ff37bdb7738f9b7846dfeb5b975c00
2018-08-29 10:09:52 -07:00
7169906249 torch.digamma (#10967)
Summary:
Fixes #10307

cc SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10967

Differential Revision: D9546748

Pulled By: zou3519

fbshipit-source-id: 764e27b1cc8dd487270b3ffa653b806c86f717dd
2018-08-29 09:43:19 -07:00
a5d7abedae Enable fusing aten::expand on GT, LT, EQ (#10845)
Summary:
GT, LT, EQ all support numpy broadcasting, just enable the fusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10845

Reviewed By: bddppq

Differential Revision: D9494089

Pulled By: houseroad

fbshipit-source-id: 7c65ca06c54dbd476ac7d07b47a413faaed3dd5e
2018-08-28 23:56:50 -07:00
db0abe1890 Fix bugs in handling of negative slice + gather indices (#10973)
Summary:
This fixes multiple bugs in the handling of negative indices in both slicing and gather operations. These were uncovered by @[1466077526:Elias Ellison]'s diff D9493614, which made it so that we actually emit negative indices when we see them in PyTorch code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10973

Reviewed By: jhcross

Differential Revision: D9546183

Pulled By: jamesr66a

fbshipit-source-id: 6cb0e84e8ad399e47e24a96c44025f644c17b375
2018-08-28 23:40:40 -07:00
6ca28984c7 Kill the dummy TaskOutput when task.get_step() (#10739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739

I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint.

But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk

This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan".

This adding a dummy TaskOutput when user specifies no TaskOutput is a hack.
The reason for this is that ZMQ socket can't send empty blob list.
As a result, if the Task on the Worker had no output,
The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`.

TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces.

Instead, we should move the creating of the dummy blob to some deeper layer,
and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces.
After this change, the workaround becomes totally transparent and no side-effect to users.

Reviewed By: mraway

Differential Revision: D9413150

fbshipit-source-id: 51aaf3201e26570b4fcf5738e9b9aa17c58777ac
2018-08-28 20:41:46 -07:00
beeec47041 Sanity checks for tracing (#10841)
Summary:
TODO: integrate into torch.onnx.export -- separate PR

*Problem:* We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation:

* The tracer encountered dynamic control flow
* Some computation escaped the tracer, and appeared as a Constant tensor node in the graph
* Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs

*Objective*: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing.

*Stage 1* (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference.

Examples:

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)])
def foo(x):
    y = torch.arange(0, x.shape[0]).float()
    return x + y.unsqueeze(1)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                              ^
		+   %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                                +++              ^
		    %2 : int = prim::Constant[value=0]()
		    %3 : Dynamic = aten::_cast_Float(%1, %2)
		    %4 : int = prim::Constant[value=1]()
		    %5 : Dynamic = aten::unsqueeze(%3, %4)
		    %6 : int = prim::Constant[value=1]()
		    %7 : Dynamic = aten::add(%0, %5, %6)
		    return (%7);
		  }
	Node diff:
		- %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                            ^
		+ %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                              +++              ^
	Trace source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Check source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		dank.py(3): <module>
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
	Source Location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(shapes (3,), (4,) mismatch)
		 x: array([0, 1, 2])
		 y: array([0, 1, 2, 3])

```
==

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    y = x.data
    return x + y
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value=<Tensor>]()
	Source Location:
		dank.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(mismatch 100.0%)
		 x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441,
		       0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245],
		      dtype=float32)
		 y: array([0.243178, 0.315964, 0.972041, 0.0215  , 0.927751, 0.457512,
		       0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272],
		      dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)])
def foo(x):
    for _ in range(x.size(0)):
        x = torch.neg(x)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		    %1 : Dynamic = aten::neg(%0)
		    %2 : Dynamic = aten::neg(%1)
		    %3 : Dynamic = aten::neg(%2)
		+   %4 : Dynamic = aten::neg(%3)
		-   return (%3);
		?            ^
		+   return (%4);
		?            ^
		  }
```

==

```
import torch

def foo(x):
    if not hasattr(foo, 'cache'):
        foo.cache = torch.neg(x)
    return x + foo.cache

traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = aten::neg(%0)
		+   %1 : Dynamic = prim::Constant[value=<Tensor>]()
		    %2 : int = prim::Constant[value=1]()
		    %3 : Dynamic = aten::add(%0, %1, %2)
		    return (%3);
		  }
	Node diff:
		- %1 : Dynamic = aten::neg(%0)
		+ %1 : Dynamic = prim::Constant[value=<Tensor>]()
	Trace source location:
		test.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		test.py(8): <module>
	Check source location:
		test.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		test.py(8): <module>
```

The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios.

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    for i in range(3):
        x[i, :] = torch.zeros(4)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32)
 y: array([0., 0., 0., 0.], dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)])
def foo(x):
    x.view(-1).add_(-x.view(-1))
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771],
      dtype=float32)
 y: array([0., 0., 0., 0., 0., 0.], dtype=float32)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841

Differential Revision: D9499945

Pulled By: jamesr66a

fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45
2018-08-28 20:25:26 -07:00
fe15aedacc Store schema in serialized modules and check arguments in function call (#10872)
Summary:
This PR adds argument checking for script method invocation from C++. For this I had to:
1. The schema of a method is currently not serialized in script modules, so we now store the function schema in the `doc_string` field of the ONNX proto. Upon loading of a serialized script module, we parse the schema into the structured C++ form and assign it to the loaded method,
2. Inside `Method::operator()`, we now verify the number and types of arguments.

CC The controller you requested could not be found.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10872

Differential Revision: D9521219

Pulled By: goldsborough

fbshipit-source-id: 5cb3d710af6f500e7579dad176652c9b11a0487d
2018-08-28 20:11:39 -07:00
ba71547e93 Add clip op to IR
Summary: self explanatory

Reviewed By: highker

Differential Revision: D9551065

fbshipit-source-id: 14b3807af5337654c360a23816cffd7dd346bad5
2018-08-28 19:25:02 -07:00
90eb0b6031 Cleanup accidental logging
Summary: cleanup

Reviewed By: duc0

Differential Revision: D9549449

fbshipit-source-id: 9154b36a39936566fc2711a6e7bd33049681d1c8
2018-08-28 18:55:29 -07:00
72a84127b1 Add Workspace methods ws.feed_blob(name, arr) ws.remove_blob(name) (#10929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10929

Workspace classes methods were missing on the Python side.

Being able to write the New Checkpoint Framework with more control of the workspace and cleaner implementation.

Added

- ws.feed_blob(name, arr)

- ws.remove_blob(name)

Reviewed By: mraway

Differential Revision: D9486867

fbshipit-source-id: ea02d2e3a39d716a5a3da0482f57d4ac4c893763
2018-08-28 17:54:34 -07:00
8e5b8490bf Add relevant code for adding caffe2 pybind extensions registry to rocm (#10975)
Summary:
cfa5dbadfc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10975

Differential Revision: D9546838

Pulled By: bddppq

fbshipit-source-id: 3bd6dc0a4eee582bb92fc33ed27fc40eb3ab1200
2018-08-28 15:40:37 -07:00
4cb968fb77 Default hidden visibility (#10752)
Summary:
Flipping to hidden visibility one more time. Let's see what fails.

cc mingzhe09088 pjh5 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10752

Reviewed By: ezyang

Differential Revision: D9526343

Pulled By: orionr

fbshipit-source-id: c0e9c29270e95e1b2e21c598095f720c199e1e52
2018-08-28 15:25:43 -07:00
92ff070b83 Add CPU version of hard sigmoid operator to caffe2 (#10837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10837

Add CPU version of hard sigmoid operator to caffe2. The definition of
this operator can be found here:
https://github.com/onnx/onnx/blob/master/docs/Operators.md#HardSigmoid.

Reviewed By: BIT-silence

Differential Revision: D9489536

fbshipit-source-id: 67b3171ed96d5ebcc8d500d93e7827a4a9705a81
2018-08-28 14:55:49 -07:00
efd2aeac9e Set -Wno-stringop-overflow only with GCC >=7 (#10954)
Summary:
`stringop-overflow` is added in GCC 7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10954

Differential Revision: D9546084

Pulled By: SsnL

fbshipit-source-id: e6e68f993f1dbaa879ca66dc43bbcff9c49890ff
2018-08-28 14:25:29 -07:00
b3601a0425 nomnigraph - add documentation for new ReplaceSubgraph api to README.md (#10802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10802

add documentation for new ReplaceSubgraph api to README.md

Reviewed By: yinghai

Differential Revision: D9473282

fbshipit-source-id: 144c895564af83cc8727a0370e894c2f0b7eadf5
2018-08-28 12:55:25 -07:00
cfa5dbadfc Add nomnigraph bindings
Summary: Adds basic nomnigraph python bindings for quickly playing with the graphs.

Reviewed By: duc0

Differential Revision: D9441936

fbshipit-source-id: fd70f8ea279b28c766e40f124008800acd94bddd
2018-08-28 12:40:16 -07:00
a88463cd9a Working async version of AllGather, test fix and compiler warnings, and CI (#10932)
Summary:
The previous NCCL all gather doesn't work as expected. This is a fully working async version.  Tested on both C++ and Python Frontend.

Multi-node:
```
tengli@learnfair042:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=0 WORLD_SIZE=2 ./ProcessGroupNCCLTest
Multi-node world size: 2 rank: 0
Allreduce test successful
Broadcast test successful
Reduce test successful
Allgather test successful

tengli@learnfair117:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=1 WORLD_SIZE=2 ./ProcessGroupNCCLTest
Multi-node world size: 2 rank: 1
Allreduce test successful
Broadcast test successful
Reduce test successful
Allgather test successful
```

CI test:
```
test_set_get (__main__.FileStoreTest) ... ok
test_set_get (__main__.PrefixFileStoreTest) ... ok
test_set_get (__main__.PrefixTCPStoreTest) ... ok
test_allreduce_ops (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_ops (__main__.ProcessGroupGlooTest) ... ok
test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok
test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok
test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_common_errors (__main__.RendezvousFileTest) ... ok
test_nominal (__main__.RendezvousFileTest) ... ok
test_common_errors (__main__.RendezvousTCPTest) ... ok
test_nominal (__main__.RendezvousTCPTest) ... ok
test_unknown_handler (__main__.RendezvousTest) ... ok
test_set_get (__main__.TCPStoreTest) ... ok
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10932

Differential Revision: D9542067

Pulled By: teng-li

fbshipit-source-id: 25513eddcc3119fd736875d69dfb631b10f4ac86
2018-08-28 12:40:14 -07:00
579bc43a14 Future-proofing embedding.py against heuristic changes (#10959)
Summary:
- rebase of https://github.com/pytorch/pytorch/pull/9851
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10959

Differential Revision: D9542292

Pulled By: weiyangfb

fbshipit-source-id: ce51864d203c8ed89da3817f1da020a0ee932960
2018-08-28 12:40:12 -07:00
3b891d9d49 Support direct access of nn.RNNCellBase
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10944

Differential Revision: D9541085

Pulled By: soumith

fbshipit-source-id: 59077f3b226d04c68a93cd6864894e8f6c594aba
2018-08-28 12:25:12 -07:00
5c58cda8ca Add subname to console output for assertExpected (#10559)
Summary:
Running `--accept` on a test doesn't tell you explicitly which sub-test is being updated, this PR fixes that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10559

Differential Revision: D9353977

Pulled By: driazati

fbshipit-source-id: a9d4014386ff0fe388a092f3dcf50f157e460f04
2018-08-28 12:13:03 -07:00
91797c0672 Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946

```
codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h
```

Reviewed By: houseroad

Differential Revision: D9539945

fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e
2018-08-28 11:57:08 -07:00
5ed62ea6fa Add Upsample example for torch onnx exporting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10550

Reviewed By: orionr

Differential Revision: D9541932

Pulled By: houseroad

fbshipit-source-id: 4d179d189c176482ae919e5cc74607b9d315ed26
2018-08-28 11:39:55 -07:00
22c9bc3117 Resolve builtins using a dict rather than by name (#10927)
Summary:
Changes the approach for resolving builtin ops so that the following works

```
add = torch.add
script
def foo(x):
  return add(x, x)
```

This handles cases when people alias torch and torch.nn.functional to
shorter names.

This works by building a table of id -> builtin name for the know builtin
ops in torch, torch.nn.functional, and for any user-defined
op created by accessing in torch.ops.foo.bar

This allows us to clean up many SugaredValue types in the compiler.

Notes:
* we now consider any attributes on python modules to be constants
(e.g. math.pi, and torch.double).
* fixes a bug where we incorrectly allowed attribute lookup on arbitrary
pyton objects. It is now restricted to modules only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10927

Differential Revision: D9527522

Pulled By: zdevito

fbshipit-source-id: 0280422af08b4b0f48f302766d5a9c0deee47660
2018-08-28 11:25:11 -07:00
c9d337f436 Split IsEmptyOp (#10918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10918

att

Differential Revision: D9515040

fbshipit-source-id: 53c05c160ba5dda92104aadc2e40801519a2cd28
2018-08-28 10:52:28 -07:00
7de830b879 proper sharing in ShareExternalPointer (#10804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10804

Make ShareData and ShareExternalPointer to create new storage when the old one is used by multiple tensors.
When we need to modify the field of storage, we'll create a new storage instead.

Reviewed By: ezyang

Differential Revision: D9350686

fbshipit-source-id: 68d2b6b886b0367b0fc4fabfd55b9a480e7388ca
2018-08-28 10:52:26 -07:00
7f9fd1cc26 allow RandomSampler to sample with replacement (#9911)
Summary:
fixes #7908
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9911

Reviewed By: yf225

Differential Revision: D9023223

Pulled By: weiyangfb

fbshipit-source-id: 68b199bef3940b7205d0fdad75e7c46e6fe65ba7
2018-08-28 10:52:25 -07:00
504d705d0f Support for CUDNN_HOME/CUDNN_PATH in C++ extensions (#10922)
Summary:
Currently we assume to find cudnn includes and libraries in the `CUDA_HOME` root. But this is not always true. So we now support a `CUDNN_HOME`/`CUDNN_PATH` environment variable that can have its own `/include` and `/lib64` folder.

This means cudnn extensions now also get support on the FAIR cluster.

soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10922

Differential Revision: D9526856

Pulled By: goldsborough

fbshipit-source-id: 5c64a5ff7cd428eb736381c24736006b21f8b6db
2018-08-28 09:40:29 -07:00
1421a9d704 added num_directions explanation to docstrings (#10786)
Summary:
Resolving [https://github.com/pytorch/pytorch/issues/10741](https://github.com/pytorch/pytorch/issues/10741). The current docs use `num_directions` quite a bit, without any explanation for them. `num_directions` is set to 2 if the RNN is bidirectional, or 1 otherwise. This change simply adds that to the docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10786

Differential Revision: D9480235

Pulled By: zou3519

fbshipit-source-id: f61d1b0d2b943f84d5b7ff83df6fe0965a508a5e
2018-08-28 09:26:06 -07:00
bee779bc83 StorageImpl scalar_type_ to data_type_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10915

Reviewed By: ezyang

Differential Revision: D9526416

Pulled By: cpuhrsch

fbshipit-source-id: 68e43121d72b1b951c73df5bf7b598854fb0e291
2018-08-28 09:26:04 -07:00
82bb9fbedd Remove Scalar.local(). (#10917)
Summary:
It's a no-op now that Scalars don't store tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10917

Differential Revision: D9520267

Pulled By: gchanan

fbshipit-source-id: 5388ff9a4fbb8fc9b9e1ce92208246bf6f08eb92
2018-08-28 07:41:36 -07:00
7c7a2ccb58 Update onnx.rst for v0.4 (#10810)
Summary:
Since we don't need `torch.autograd.Variable` anymore, I removed `torch.autograd.Variable` from `onnx.rst`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10810

Differential Revision: D9500960

Pulled By: zou3519

fbshipit-source-id: 1bc820734c96a8c7cb5d804e6d51a95018db8e7f
2018-08-28 07:26:01 -07:00
de099564e3 Minor copy-edit on README
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10931

Reviewed By: cpuhrsch

Differential Revision: D9526248

fbshipit-source-id: 2401a0c1cd8c5e680c6d2b885298fa067d08f2c3
2018-08-27 21:09:36 -07:00
de9cc98e66 Stop copying tensor memory when importing IR
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10487

Differential Revision: D9370084

Pulled By: li-roy

fbshipit-source-id: ecff1d5d7d006fd60e4f6238ee86c56ad168bfc8
2018-08-27 19:25:42 -07:00
2c342e50e1 Fix a bug in constant prop (#10923)
Summary:
More support for tuples has uncovered a bug in constant prop where
it assumed it can create constant nodes of tuples, even though we
cannot easily create a single prim::Constant to represent a tuples.
This fix checks when we cannot represent an IValue as a prim::Constant
and then stops propagating the node.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10923

Reviewed By: orionr

Differential Revision: D9523417

Pulled By: zdevito

fbshipit-source-id: 745058c4388d9a5e0fc1553eaa2731e31bc03205
2018-08-27 18:10:17 -07:00
157fb46ffc Add -rdynamic only to linker flags to avoid compiler warnings (#10789)
Summary:
`clang: warning: argument unused during compilation: '-rdynamic'`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10789

Reviewed By: houseroad

Differential Revision: D9467385

Pulled By: bddppq

fbshipit-source-id: 610550a8f34cfa66b9dfa183752eb129dae21eaa
2018-08-27 17:56:21 -07:00
f7b02b3a68 Change Tensor/TensorImpl to use c10::intrusive_ptr (#10824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824

API additions:
- Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&)
- Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&)
- Tensor::operator=(Tensor&&) && (for completeness sake)
- TensorBase::unsafeGetTensorImpl()
- TensorBase::unsafeReleaseTensorImpl()
- TensorBase::getIntrusivePtr()
- TensorImpl::type_id()
- Tensor::set_data()
- Tensor::is_same(Tensor)
- Tensor::use_count()
- Tensor::type_id()
- Tensor::scalar_type()
- WeakTensor::is_same(WeakTensor)
- intrusive_ptr::weak_use_count()
- weak_intrusive_ptr::weak_use_count()
- c10::raw::intrusive_ptr::{incref,decref,make_weak}
- c10::raw::weak_intrusive_ptr::{incref,decref,lock}

API changes:
- Tensor::pImpl is no longer public (and now named tensor_impl_)
    - Most methods accessed this way are now accessible on Tensor
      maybe_zero_dim() and set_wrapped_number() being prominent exceptions
      (they are now accessed through unsafeGetTensorImpl())
- Type is no longer friend of Tensor
- TensorBase::reset(TensorImpl*) is deleted
- TensorBase::reset(TensorImpl*, bool should_retain) is deleted
- TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead
- TensorBase::get() is deleted; use unsafeGetTensorImpl() instead
- TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead
- TensorBase::retain() is deleted; use _raw_incref() instead
- TensorBase::release() is deleted; use _raw_decref() instead
- WeakTensor lost most of its methods (it no longer inherits from
  TensorBase)
- TensorImpl::storage() is now a const method
- Tensor(TensorBase) constructor removed, instead
  we go through getIntrusivePtr().  I'm not sure about
  this change; I happened to have accidentally removed the
  TensorBase constructor and decided to fix call sites,
  but I could go the other way.
- detail::set_data() is deleted; use Tensor::set_data() instead
- c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead.
  (The reason for this change, is that it is invalid to cast an intrusive_ptr_target*
  to a raw_intrusive_ptr_target* to take advantage of the methods. But there is
  no reason the incref/decref methods shouldn't also work on intrusive_ptr_target;
  it is primarily an API consideration. We can be more standards compliant by
  keeping them as functions, which are universally applicable.)
- intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on
  pointers of the NullType. (This counts as a bug fix, because the documentation
  specified that pointers produced by release() are valid to reclaim(), and
  a release() on a null intrusive_ptr produces the NullType::singleton())

Bug fixes:
- Dispatch code for mutable references incorrectly returned
  a reference to a value argument (which would immediately
  go out of scope).  They now correctly return a tensor by
  value.
- intrusive_ptr copy/move assignment did not work correctly when
  an object was assigned to itself. We now check for this case and
  no-op if so. (This bug manifested itself as a Tensor mysteriously
  becoming an UndefinedTensor after lines of code like
  'x = x.mul_(y)')

Other changes:
- The checked cast functions in Utils.h have now been
  renamed and detemplatized into checked unwrap functions.
- Added type_id() and scalar_type() methods to Tensor
- pImpl is no longer public
- Documented what the && overloads are doing
- All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor')
  have been expunged. This is NO LONGER a valid way to create a new
  tensor, and if you do this, upon your first incref, you will catch an ASSERT
  failure saying that only tensors created by intrusive_ptr::release() are valid
  to reclaim(). Use c10::make_intrusive instead in this situation.
- IValue is adjusted to use intrusive_ptr instead of Retainable, and all
  other sub-classes of Retainable were modified to use intrusive_ptr.
  When doing this, I had to make the constructors of sub-classes like
  ConstantList public, so that c10::make_intrusive could invoke them.  Fortunately,
  if you incorrectly stack allocate a ConstantList, and then try to get an
  intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0.
- IValue very narrowly sidesteps the problem of handling NullType, as it
  considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor>
  which is not always true. This was always the case, but there's now a comment
  explaining what's going on.

Some MSVC bugs were uncovered during the preparation of this patch.
They are documented as comments in the code.

Reviewed By: gchanan

Differential Revision: D9481140

fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc
2018-08-27 16:11:01 -07:00
f2bb9f0bb5 speed up kl div loss (#10336)
Summary:
Moved kl div loss to aten.

benchmarks for 5000 iterations on input size (1000,100)

New
```
cuda:
forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006]
input requires_grad=True:
backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155]
double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918]
target requires_grad=True:
backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101]
double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982]

cpu:
forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648]
input requires_grad=True:
backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492]
double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031]
target requires_grad=True:
backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143]
double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685]
```

Old
```
cuda:
forward [3.101281268056482, 3.068499860819429, 3.0527669726870954]
input requires_grad=True:
backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839]
double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133]
target requires_grad=True:
backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282]
double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176]

cpu:
forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352]
input requires_grad=True:
backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552]
double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931]
target requires_grad=True:
backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046]
double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10336

Differential Revision: D9213636

Pulled By: li-roy

fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b
2018-08-27 16:10:59 -07:00
f5910c8a36 Add MIOPEN recurrent operator (#10840)
Summary:
The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test.
bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840

Differential Revision: D9518980

Pulled By: bddppq

fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f
2018-08-27 15:39:56 -07:00
8e33451e2e Make torch.cuda.* take device objects; Update distributed docs (#10833)
Summary:
Commits:

1. Make `torch.cuda.*` take device objects
2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833

Differential Revision: D9514241

Pulled By: SsnL

fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e
2018-08-27 15:24:42 -07:00
58b145f515 Fix negative indices in tracer (#10560)
Summary:
Previously when tracing slicing & select negative indices would get normalized, fixing the index to the size of the traced tensor. This makes the behavior the same as script so aten::select with negative indices is emitted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10560

Differential Revision: D9493614

Pulled By: eellison

fbshipit-source-id: ce7a8bae59863723247208d86b9f2948051ccc6c
2018-08-27 15:19:41 -07:00
9aa92bc261 Change the default value of DeviceOption.numa_node_id from -1 to 0 (#10877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10877

change default value of DeviceOption.numa_node_id to 0 and use has_numa_node_id() to check existence

Reviewed By: ilia-cher

Differential Revision: D9473891

fbshipit-source-id: 91ac6a152f445644691023110c93d20a3ce80d43
2018-08-27 14:55:46 -07:00
7842b6d0f7 Fix at::optional compile problems on Windows CUDA.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10909

Differential Revision: D9516837

Pulled By: gchanan

fbshipit-source-id: fad7e3284e74c599b873ebaae2dcdf5013505855
2018-08-27 14:40:41 -07:00
6ce799edd6 Tuples/Lists can now be inputs/outputs to script and other simple fixes. (#10812)
Summary:
* Fix the necessary pathways so that tuples and lists can be inputs to the script.

* prevent linear algebra functions from being run in shape prop because
they frequently will error out for nonsense data.

* favor schema-driven python input conversion where possible.
remaining cases where we directly create Stacks without schema are
only for debugging

* Make the error messages when calling script/trace functions more pythonic

* Simplify FlattenTuples -- now that tuples are supported we can choose to only flatten tuples when needed. This may have to be revisited pending onnx test results, but is necessary for making tuple io work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10812

Differential Revision: D9477982

Pulled By: zdevito

fbshipit-source-id: ed06fc426e6ef6deb404602a26c435a7fc40ea0c
2018-08-27 14:40:40 -07:00
f64f6eed3a move HeatmapMaxKeypointOp unittest to oss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10859

Reviewed By: newstzpz

Differential Revision: D9498312

fbshipit-source-id: 08b8a596f774c9102286019f286ca0b74d1f5304
2018-08-27 12:56:46 -07:00
35beecfe17 fix xfails involving literals (#10905)
Summary:
I missed these in #10900

cc apaszke jamesr66a zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10905

Differential Revision: D9516748

Pulled By: zou3519

fbshipit-source-id: a5c3e3b65a33c339d5c4e9fc160462c3d35705f3
2018-08-27 12:41:06 -07:00
f940af6293 Bag of Distributions doc fixes (#10894)
Summary:
- Added `__repr__` for Constraints and Transforms.
- Arguments passed to the constructor are now rendered with :attr:

Closes https://github.com/pytorch/pytorch/issues/10884
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10894

Differential Revision: D9514161

Pulled By: apaszke

fbshipit-source-id: 4abf60335d876449f2b6477eb9655afed9d5b80b
2018-08-27 09:55:27 -07:00
67f6f930a8 Remove FIXME_zerol() from test_jit.py (#10900)
Summary:
The scalar situation has gotten a lot better and now we can
remove all instances of FIXME_zerol().

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10900

Differential Revision: D9514206

Pulled By: zou3519

fbshipit-source-id: e4e522f324126c5454cd6de14b832d2d1f6cb0ce
2018-08-27 08:55:08 -07:00
841d779598 Increase BC for PackedSequence ctor (#9864)
Summary:
PackedSequence is never supposed to be created by user, but unfortunately some community repo is already doing this (e.g., [here](7c191048ce/torchmoji/model_def.py (L218-L229))). Some change we made break the calling pattern `PackedSequence(data=x, batch_sizes=y)`. This patch adds back support for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9864

Differential Revision: D9011739

Pulled By: SsnL

fbshipit-source-id: 0e2012655d7f4863ec54803550df30874ec35d75
2018-08-27 08:25:23 -07:00
c3271b53e4 Remove ability of Scalars to hold Tensors.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10889

Differential Revision: D9512589

Pulled By: gchanan

fbshipit-source-id: 8b2b26c9f3a4da31a46f684793ab237e9ef9a323
2018-08-27 07:26:14 -07:00
3aaad3ecb1 Begin a bestiary of MSVC/NVCC bugs. (#10883)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10883

Differential Revision: D9513997

Pulled By: ezyang

fbshipit-source-id: 37db956e57d86471323d284869bb844f5a4753ac
2018-08-27 07:09:47 -07:00
c8b246abf3 Prevent JIT from overspecializing to every single size configuration (#10844)
Summary:
Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details.

Summary of changes:

- Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail.
- Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`.
- Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`.
- Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`.

zdevito ezyang mruberry ngimel csarofeen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844

Differential Revision: D9498705

Pulled By: apaszke

fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61
2018-08-26 09:54:48 -07:00
9679fc5fcd Handling failing test on ROCm.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10854

Reviewed By: ezyang

Differential Revision: D9498721

Pulled By: Jorghi12

fbshipit-source-id: 4018383fea5a2a6baff7183b0c0197a4b7a09f20
2018-08-26 07:55:33 -07:00
ddc37d7487 Update mobile predictor caller's interface
Summary: Update all the caller for the new interface

Reviewed By: highker

Differential Revision: D9323167

fbshipit-source-id: a39335ceb402db0719f5f2314085ba9a81380308
2018-08-24 23:40:05 -07:00
d632ccd2c1 Cache isContiguous and numel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10696

Differential Revision: D9437963

Pulled By: cpuhrsch

fbshipit-source-id: 7217682f5e4b69c73d943411d738e4892bb465f5
2018-08-24 22:40:39 -07:00
17dac3e17f Create class constant for string literal 'blob_names'
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10827

Reviewed By: boryiingsu

Differential Revision: D9484567

fbshipit-source-id: 275eddc9406b5f427d72c0ab9b0da481b5e59ece
2018-08-24 22:11:43 -07:00
8253cfaa72 Conv BN fusion for 3D conv (#10239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10239

Make Conv + BN fusion also work for 3D convolutions

Reviewed By: duc0

Differential Revision: D9176314

fbshipit-source-id: 6604aa569c5c3afdb4480a5810890bc617e449c4
2018-08-24 21:24:36 -07:00
542aadd9a7 Stop using symbolic override for tracing RNNs (#10638)
Summary:
This disables the symbolic override hacks and makes tracing emit the recently added ATen ops for RNNs (`aten::lstm`, `aten::gru`, ...). I managed to reuse pretty much all of the translation code for their symbolics.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10638

Differential Revision: D9385830

Pulled By: apaszke

fbshipit-source-id: ff06ef7b1ae7c3b7774825e0991bc3887e1ff59b
2018-08-24 20:25:58 -07:00
f2f6e6c0e8 Add registry to pybind_state (#10759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10759

Adding a basic registry pattern to pybindstate so that we can have separate 'cc' files register module updates.  This is substantially cleaner than using multiple pybind modules (which have been known to cause bugs)

Reviewed By: bddppq

Differential Revision: D9441878

fbshipit-source-id: af9e9e98385e92b58ca50e935678328c62684d8e
2018-08-24 17:25:02 -07:00
c172ffb632 Remove the nanopb submodule
Summary:
After making changes internally, really remove the nanopb submodule.

Finalizes https://github.com/pytorch/pytorch/pull/10772

Reviewed By: yns88

Differential Revision: D9504582

fbshipit-source-id: 4517607e5c8054a255c3984b8265f48fede2935b
2018-08-24 16:24:57 -07:00
148ea2a653 Create at::linear (#10799)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/10755 with fix for ONNX

ezyang jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10799

Differential Revision: D9482168

Pulled By: goldsborough

fbshipit-source-id: 85d4bdfcf0d451f2e7a1c83c5f5415cdd6caacdc
2018-08-24 16:02:08 -07:00
1fbabff76a Refactor THCNumerics and add common math functions for at::Half (#10301)
Summary:
**Summary**: This PR is a followup of mruberry's https://github.com/pytorch/pytorch/pull/9318/. It tries to achieve the following:
- Specializing std common math functions for `at::Half` type.
- Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`.
- Update `THCNumerics.cuh` with new usage and comments to  demonstrate the best practice for developers and hence, making way for its deprecation.
- Remove legacy/redundant code path.
- Remove unused CUDA HALF macros (see separate PR https://github.com/pytorch/pytorch/pull/10147)

**Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed:
- All arithmetic can now be done in ATen using binary cuda kernel  or CUDA tensor pointwise apply (check https://github.com/pytorch/pytorch/pull/8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float.
- Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h`
- Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call.
- Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for
`at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: https://github.com/ROCm-Developer-Tools/HIP/issues/374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP).

Here are some reference PRs that was handy in refactoring TH into ATen:
- https://github.com/pytorch/pytorch/pull/6786
- https://github.com/pytorch/pytorch/pull/5475
- https://github.com/pytorch/pytorch/pull/9401
- https://github.com/pytorch/pytorch/pull/8689
- https://github.com/pytorch/pytorch/pull/8919
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10301

Differential Revision: D9204758

Pulled By: soumith

fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a
2018-08-24 16:02:06 -07:00
87a7840fa6 Remove Tensor constructor of Scalar. (#10852)
Summary:
This is along the way of removing Tensor as a member of the tagged union in Scalar.  This simplifies ordering dependencies, because currently Scalar and Tensor both depend on each other (so we introduce a TensorBase).  Also, this API isn't particularly useful publicly: we can't autograd through Scalars, so you still need a Tensor overload basically everywhere anyway.

I'm undecided what the final API should be here.  We could keep a Tensor constructor on Scalar, but have it generate a local scalar; this is convenient but given this API used to be non-synchronizing, it may not be the best.

For now, I'm just using _local_scalar, which is clear, although we should get rid of the prefix _ if that's the API we intend to promote.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10852

Reviewed By: ezyang

Differential Revision: D9496766

Pulled By: gchanan

fbshipit-source-id: 16f39b57536b9707132a5a4d915650c381bb57db
2018-08-24 16:02:05 -07:00
0d5584d8d7 Revert D9492561: [pytorch][PR] Moving the operator argument to the front for kernelPointwiseApply.
Differential Revision:
D9492561

Original commit changeset: d0f0e2ab7180

fbshipit-source-id: fc822e63b11866195ff7883f360338a41e25d9e2
2018-08-24 16:02:04 -07:00
0ef5cfd28c fix ivalue printing for lists (#10777)
Summary:
Fixing the printing of IValue lists, which didn't work previously.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10777

Differential Revision: D9474264

Pulled By: eellison

fbshipit-source-id: 0c7d6e7ecaa3f7908b131ac9f1036f19ac4f8b4f
2018-08-24 16:02:03 -07:00
983e0f2413 Remove Node::invalidateSchema (#10822)
Summary:
The schema_ field is a private and internal cache for nodes, and no
methods meant to manipulate it should be publicly visible. This call
wasn't even necessary at its call site, since removeInput will reset the
schema by itself.

zdevito jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10822

Reviewed By: zdevito

Differential Revision: D9498683

Pulled By: apaszke

fbshipit-source-id: 42e1743e3737cb7d81f88e556204487d328c0e47
2018-08-24 16:02:01 -07:00
74e6a666b3 If none of the schema match, add ImplicitTensorToNum conversions where needed. (#10180)
Summary:
When matching schema, first try to match without adding TensorToNum conversions. Then make another pass where TensorToNum conversions are allowed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10180

Differential Revision: D9438153

Pulled By: eellison

fbshipit-source-id: 80541b5abd06e9d4187e89dda751f44dab6f58c5
2018-08-24 16:02:00 -07:00
474684cf03 Re-sync with internal repository (#10868) 2018-08-24 15:48:03 -07:00
8044dc4eb8 Support new Reshape semantics (#10848)
Summary:
Since ONNX opset version >5, Reshape changed semantics to take a shape tensor as input instead of relying on `shape` attribute to decide what shape to reshape to. ONNXIFI op has been postponing this change as some of the backends such as TensorRT were not ready. Now that the backends have adopted this semantics, we can remove the legacy mode and output opset version 7 ONNX models.

This change also flushes out some of the bugs and new requirement.
- Converting shape info into int64 tensor
- Fix a bug when we output the shape tensor in the mapped workspace instead of the original workspace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10848

Reviewed By: houseroad

Differential Revision: D9495121

Pulled By: yinghai

fbshipit-source-id: a6f44a89274c35b33fae9a429813ebf21d9a3d1a
2018-08-24 11:46:41 -07:00
8130b1a950 Ignore stack frames coming from python3 object file (#10627)
Summary:
goldsborough
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10627

Reviewed By: ezyang

Differential Revision: D9384411

Pulled By: apaszke

fbshipit-source-id: ce4f6edb9ffbd0c7e320b9347da10399de472150
2018-08-24 11:26:21 -07:00
6e2f6dc6e6 Move Allocator and Device to ATen/core
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10798

Reviewed By: ezyang

Differential Revision: D9466602

fbshipit-source-id: f5bda17045076d8c81be9fa5a0749c97bf274b5f
2018-08-24 11:26:19 -07:00
f1df85d799 bug-fix in normal_( ) (#10846)
Summary:
- fixes #10642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10846

Differential Revision: D9495014

Pulled By: weiyangfb

fbshipit-source-id: 35a9fc349f9f0c21a24141f29c62853ab6a68dae
2018-08-24 11:26:18 -07:00
313139d14e Moving the operator argument to the front for kernelPointwiseApply. (#10829)
Summary:
Currently on PyTorch AMD, memory accesses on the TensorInfo struct contained in the Operators passed into the kernelPointwiseApply kernel leads to hangs on the HCC runtime. Permuting the argument order such that the operator is first alleviates this issue and the kernel hangs disappear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10829

Reviewed By: ezyang

Differential Revision: D9492561

Pulled By: Jorghi12

fbshipit-source-id: d0f0e2ab7180e55846db909f2744b8c8b110205e
2018-08-24 11:10:43 -07:00
e3d12d7afb Automatic update of fbcode/onnx to 6146a85d371481222c10ede4430ad5476e60de87 (#10831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10831

Previous import was 7848f1e0414ba3b2e263609d93d46fd60790b2e9

Included changes:
- **[6146a85](https://github.com/onnx/onnx/commit/6146a85)**: Check pybind version (#1315) <Changming Sun>
- **[2cbf740](https://github.com/onnx/onnx/commit/2cbf740)**: Domain exists in GraphProto but not in Node (#1310) <Ryan Hill>
- **[9b874e9](https://github.com/onnx/onnx/commit/9b874e9)**: [Title] Add optimization pass eliminating nop Pad (#1307) <Tingfan Wu>

Reviewed By: yinghai

Differential Revision: D9485475

fbshipit-source-id: 3adb4e6e182278fd2abe5068a9d4569763e0ff0c
2018-08-24 10:54:40 -07:00
3c9775fff8 Remove nanopb since we've switched to protobuf (#10772)
Summary:
We no longer use nanopb in PyTorch (or Caffe2) so removing. All protobuf manipulation should go through standard protobuf, which is statically linked inside libcaffe2.so by default.

cc zdevito pjh5 ezyang Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10772

Reviewed By: pjh5

Differential Revision: D9465894

Pulled By: orionr

fbshipit-source-id: 8cdf9f1d3953b7a48478d381814d7107df447201
2018-08-24 10:54:38 -07:00
8c13971f57 Remove protobuf require and use requirements.txt (#10771)
Summary:
In prep for making FULL_CAFFE2 default, users shouldn't be required to have protobuf installed.

cc pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10771

Reviewed By: pjh5

Differential Revision: D9474458

Pulled By: orionr

fbshipit-source-id: 3e28f5ce64d125a0a0418ce083f9ec73aec62492
2018-08-24 10:39:40 -07:00
474bd60bad Provide a tensor overload to mul_out_sparse_scalar. (#10828)
Summary:
This is a small part of the effort to remove Tensor as a tagged member in Scalar because it is inconsistent with how we normally do overloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10828

Differential Revision: D9485049

Pulled By: gchanan

fbshipit-source-id: 103f5cc03bb7775cd2d3a0a5c0c5924838055f03
2018-08-24 09:39:26 -07:00
e146518e46 Fix AT_CUDA_CHECK and AT_CUDNN_CHECK macros (#10834)
Summary:
Previously, the macros evaluated the expression multiple times on error.

For example:

```
AT_CUDA_CHECK(cudaStreamWaitEvent(ptr->stream, event, 0));
```

would previously expand to

```
if (cudaStreamWaitEvent(ptr->stream, event, 0) != cudaSuccess) {
    AT_ERROR("CUDA error: ", cudaGetErrorString(cudaStreamWaitEvent(ptr->stream, event, 0)));
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10834

Differential Revision: D9493257

Pulled By: colesbury

fbshipit-source-id: d2473020fd83a25aa421171d19c8dfe559155a9b
2018-08-24 09:09:18 -07:00
ca567862b2 Support multidimensional indexing (#10787)
Summary:
Part of #10774.

This PR does the following:
- Support ast.ExtSlice in the frontend. This is done by returning a
  list of ast.Index and ast.Slice.
- Support multidimensional indexing with ints and slices

The general approach is to desugar multidimensional indexing into
at::slice, at::select operations. This is exactly how normal pytorch
does indexing (by desugaring it into at::slice, at::select, and other ops).

I used [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) as reference.
We should be able to copy the rest of this to implement the missing
indexing features in script (indexing with ellipses, tensors, sequences, etc).

After I'm done implementing the missing indexing features in future prs, I can try to
templatize python_variable_indexing.cpp so that it can work with both JIT
script and normal pytorch indexing, but right now I'm not sure if that's
a good idea or not.

cc zdevito jamesr66a apaszke wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10787

Differential Revision: D9481402

Pulled By: zou3519

fbshipit-source-id: 78c9fa42771a037d157879e23e20b87401cf1837
2018-08-24 08:10:32 -07:00
6993e4a9f7 Caffe2 Functional enforcing inplace output (#10797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10797

A few operators enforces in-place output (e.g., running mean/var for SpatialBN). Functional right now doesn't follow the inplace_enforced_ rules in OpSchema and therefore, the RunNetOnce() will fail on OpSchema->Verify(). Edit the output_names in Functional following the rules to pass check.

Reviewed By: jerryzh168

Differential Revision: D9470582

fbshipit-source-id: 168efeccecc32184bd1d02f3fefe8e61faa4e0f4
2018-08-23 22:42:47 -07:00
8da4167129 Fix performance regression (#10835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10835

The last diff of constructor cause performance regression in cold run.
This one tried to fix this.

Reviewed By: highker

Differential Revision: D9489617

fbshipit-source-id: a77c2e2c903a73e2ad9806b4f9c209cdb751442f
2018-08-23 19:55:23 -07:00
df2d48b42c Added PrefixStore, pybind, test for group backward compatibility (#10762)
Summary:
Added Prefix Store support.

This will make group be backward compatible.

Test is covered too.
```
tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./FileStoreTest
Using temporary file: /tmp/testoglRl4
Using temporary file: /tmp/testepZIpB
Test succeeded
tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./TCPStoreTest
Test succeeded
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10762

Differential Revision: D9484032

Pulled By: teng-li

fbshipit-source-id: 85754af91fe3f5605087c4a2f79ae930a9fd1387
2018-08-23 18:10:37 -07:00
61b34d42e7 nomnigraph - isSubgraphMatch returns the matched Subgraph & map from MatchNodes to graph nodes (#10605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10605

Make isSubgraphMatch returns a subgraph and map from MatchNodes to graph nodes in the result, which makes it easier to write graph fusion logic. Also include some more helper methods for NN subgraph matcher.

Reviewed By: bwasti

Differential Revision: D9374931

fbshipit-source-id: 3a273295eec81a43027ec3a9e835d27f00853df9
2018-08-23 16:40:19 -07:00
ee022a476a Added this-consts to all methods on SymbolicVariable (#10805)
Summary:
Self explanatory. See https://github.com/pytorch/pytorch/issues/9109 or T32954812 for more details
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10805

Reviewed By: ezyang

Differential Revision: D9477686

Pulled By: hakobyant

fbshipit-source-id: 73dd84e5295e4c749bd6416ce2f6eb7590f05cbc
2018-08-23 16:25:27 -07:00
9403e0cac0 Use ATen implementation of RNNs (#10761)
Summary:
apaszke recently ported RNNs from Python into ATen, which means we can replace our implementation in the C++ API (written by ebetica) with the ATen implementation, which cleans up a lot of code (+99, -323). Thanks apaszke!

I also added the `bidirectional` and `batch_first` options to the C++ API RNN options, just because why not.

apaszke ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10761

Differential Revision: D9443885

Pulled By: goldsborough

fbshipit-source-id: b6ef7566b9ced2b2f0b2e1f46c295b6f250c65a8
2018-08-23 16:12:14 -07:00
a4c59a9dab MIOpen integration, more tests enabled, bug fixes (#10612)
Summary:
* first integration of MIOpen for batch norm and conv on ROCm
* workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing
* workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script
* use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm
* enable test_sparse set on CI, skip tests that don't work currently on ROCm
* enable more tests in test_optim after the elementwise_bug got fixed
* enable more tests in test_dataloader
* improvements to hipification and ROCm build

With this, resnet18 on CIFAR data trains without hang or crash in our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612

Reviewed By: bddppq

Differential Revision: D9423872

Pulled By: ezyang

fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd
2018-08-23 15:24:47 -07:00
3d43a82440 Add support for vararg style functions. (#10250)
Summary:
Things like `zeros(1,2,3, dtype=torch.int)` are now supported in the script by altering tryMatchSchema to auto-construct the list `[1,2,3]` when it sees inlined members of the list as the last positional arguments.

I suggest reading the commits individually, since the first two incrementally change how we do tryMatchSchema to get it ready for adding vararg list conversion, while the third actually does the modification.

closes #10632
closes #8516
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10250

Differential Revision: D9478235

Pulled By: zdevito

fbshipit-source-id: 0c48caf7a6184e463d9293d97015e9884758ef9c
2018-08-23 15:10:36 -07:00
9dbcc9cebd Move _raw_* intrusive pointer manipulations to raw_intrusive_ptr_target (#10779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10779

The idea is to let classes opt-in to providing these methods
by default.

Reviewed By: jerryzh168

Differential Revision: D9466076

fbshipit-source-id: b6beee084cc71d53ce446cdc171d798eeb48dc12
2018-08-23 14:32:24 -07:00
dec3ed7b49 Increase the limit for Proto size (#10745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10745

ParseProtoFromLargeString hits limit when using recurring v2. To unblock warmup project, we can increase the limit temporarily. More details in this post -- https://fb.facebook.com/groups/264913123977784/permalink/463566404112454/

Differential Revision: D9436368

fbshipit-source-id: 54488f27ef941cab679843cb0c502095dd056c1b
2018-08-23 13:55:50 -07:00
432b3adffc Print blob sizes on fatal signal (#10766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10766

Added a `Workspace::ForEach(...)` API for accessing the global set of
existing Workspace instances. This is used in the signal handler to print blob
info on the thread receiving a fatal signal.

Reviewed By: mraway

Differential Revision: D9147768

fbshipit-source-id: a94d0b5e6c88390a969ef259ecb8790173af01a4
2018-08-23 13:39:55 -07:00
82ddeb7f2b Using shared implementation in Tensor (#10619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10619
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9047

Reviewed By: jerryzh168

Differential Revision: D8417101

fbshipit-source-id: 98e0a3275864283c2f06d28f4c9b859b5827ed4d
2018-08-23 13:39:53 -07:00
23a366be33 Use ATen native functions for THCTensor_cadd/cmul/cdiv/csub (#10707)
Summary:
This seems to save a few percent in binary size in libcaffe2_gpu.so, but
the effect may not be real. In fact, deleting some functions can cause
the binary size to increase (perhaps due to alignment issues).

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10707

Differential Revision: D9409009

Pulled By: colesbury

fbshipit-source-id: 282931e562e84e316a33ac6da4788c04c2984f08
2018-08-23 13:31:03 -07:00
0f5c8edfd3 Removes unused THCState code paths (#9735)
Summary:
To prepare THCState for refactoring into ATen, this PR removes unused THCState code paths. In particular, it:

- Removes the UVA Allocator
- Removes the THDefaultDeviceAllocator
- Respects the 1 BLAS and 1 sparse handle per device reality
- Removes kernel p2p access
- Removes setting p2p access
- Removes the GCHandler code path
- Removes many unused THCState_... functions
- Removes THCThreadLocal.h/.cpp

It does not change the preexisting external behavior of any remaining function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9735

Differential Revision: D9438558

Pulled By: SsnL

fbshipit-source-id: dde9acbec237a18bb6b75683e0526f7ff1c9a6ea
2018-08-23 13:10:05 -07:00
ab9e7ae23e Add CUDA implementation of LARS --caffe2 (#10509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10509

This diff enables CUDA implementation of LARS operator in caffe2.

Reviewed By: enosair

Differential Revision: D9318356

fbshipit-source-id: 365b9f01e3afd4d9d3ba49155e72e728119f40c5
2018-08-23 12:55:57 -07:00
b14f2e899c Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279)
Summary:
When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times:

```
_sparseDims + _denseDims = len(shape)
_indices.shape: dimensionality: 2,  shape: (_sparseDims, nnz)
_values.shape:  dimensionality: 1 + _denseDims.  shape: (nnz, shape[_sparseDims:])
```

This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled.

Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279

Differential Revision: D8936683

Pulled By: yf225

fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e
2018-08-23 10:10:24 -07:00
0eb2c83006 Fix link in THNN/README.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10821

Differential Revision: D9481118

Pulled By: soumith

fbshipit-source-id: 0a416202eb4db025ec7d395e70344cbbf626fec0
2018-08-23 09:25:16 -07:00
fcfb1c1979 Make more distributions jittable
Summary:
This uses zou3519's new `torch.broadcast_tensors()` #10075 to make `Categorical.log_prob()` and the `*Normal.__init__()` methods jittable. Previously `.log_prob()` was failing due to calls to `torch._C.infer_size()` with errors like
```
    def log_prob(self, value):
        if self._validate_args:
            self._validate_sample(value)
>       value_shape = torch._C._infer_size(value.size(), self.batch_shape) if self.batch_shape else value.size()
E       RuntimeError: expected int at position 0, but got: Tensor
```
After this change I'm able to jit many more of Pyro's tests.

Reviewed By: ezyang

Differential Revision: D9477487

Pulled By: apaszke

fbshipit-source-id: 5f39b29c6b8fa606ad30b02fefe2dfb618e883d6
2018-08-23 08:09:49 -07:00
529fc68df2 Update docs with clean (#10819)
Summary:
Add tip about cleaning if installing ninja after a build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10819

Reviewed By: soumith

Differential Revision: D9480095

Pulled By: erikbrinkman

fbshipit-source-id: 96ae1387038afe6964a1bd1e2186468f6a5ea12f
2018-08-23 07:25:19 -07:00
deda05e59f Revert D9395814: move HeatmapMaxKeypointOp unittest to oss
Differential Revision:
D9395814

Original commit changeset: 25073eb6b143

fbshipit-source-id: 56f2b7b57e3c6361e2d78e5ba7850ea3b89e98fb
2018-08-23 06:54:29 -07:00
b885dea300 parallize the dense part in event models
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10768

Reviewed By: Wakeupbuddy

Differential Revision: D9445750

fbshipit-source-id: b8c2ddfe3ccb9278506de15a5e43bada016408f7
2018-08-22 22:40:07 -07:00
5c0eece2fd Force types on values returned from if blocks to be equivalent (#10281)
Summary:
When emitting if Branches, check that the types on each value returned are equivalent. As with reassignment of values, tensors are not forced to be the same shape or subtype.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10281

Differential Revision: D9466566

Pulled By: eellison

fbshipit-source-id: 746abdeb34a0f68806b8e73726ad5003b536911c
2018-08-22 19:55:38 -07:00
9a43fc5eaa move HeatmapMaxKeypointOp unittest to oss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10674

Reviewed By: newstzpz

Differential Revision: D9395814

fbshipit-source-id: 25073eb6b143fc1e7cbf5f887545d2b7df15c9a9
2018-08-22 19:11:10 -07:00
4aa5075cae update the constructor to accept the PredictorConfg only to set up the predictor (#9483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9483

The interface is updated to accept the config to construct the predictor.

Reviewed By: highker

Differential Revision: D8872999

fbshipit-source-id: 3ca54d644970823fc33c0ade9a005e12f52e2b24
2018-08-22 19:11:09 -07:00
f0ec3bfa56 Changes for Python3 compatibility (#10524)
Summary:
Review by tomdz volkhin anshulverma
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10524

Reviewed By: ezyang

Differential Revision: D9328001

Pulled By: huitseeker

fbshipit-source-id: 144721c4fd9a1ea6cf6673793416f20cb448aa93
2018-08-22 18:55:01 -07:00
44b47fd7f3 Working pybind version of MPI process group and abort() pybind (#10606)
Summary:
This will make pybind version of MPI PG work. The issue is the scope of the tensor list won't be available for the MPI worker thread. So we pass the vector by value instead.

Also added recv_anysource pybind to make it work. The front-end API will wrap one level up with an int for this function. So taking a tensor should be the easiest way for now.

Also added abort pybind and fixed the flaky test.
```
tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ mpirun -np 8 ProcessGroupMPITest
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10606

Differential Revision: D9474393

Pulled By: teng-li

fbshipit-source-id: cca236c333656431e87d0d3573eeae9232c598b0
2018-08-22 18:26:04 -07:00
6c75fc0aa3 Intergrating stochastic quantization to easgd to reduce communication + supporting quantization on both sides (split from D8849770) (#10644)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10644

Depends on D8493264

Reviewed By: chocjy, boryiingsu

Differential Revision: D9347706

fbshipit-source-id: 6fdcc5b61098bf47ec9391b1f009b0e6a0615842
2018-08-22 17:10:03 -07:00
f72e813c2f Allow tracing functions that take tuples of tensors as inputs (#10637)
Summary:
And return tuples.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10637

Reviewed By: eellison

Differential Revision: D9385892

Pulled By: apaszke

fbshipit-source-id: 542f4444d909fb246d7f1d88d6fb98345de2d431
2018-08-22 15:37:10 -07:00
043a2e36e5 Removing setup_caffe2.py (#10734)
Summary:
FULL_CAFFE2=1 python setup.py (install | build_deps develop) should be all anyone needs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10734

Reviewed By: orionr

Differential Revision: D9439354

Pulled By: pjh5

fbshipit-source-id: 0169afcda4f8f38c57498ba2151f7654ecce6070
2018-08-22 15:37:07 -07:00
6c84f7fea0 Relax RHS type assert for augassign (#10730)
Summary:
Augassign (i.e., `x += 1`) gets desugared to an assignment of a binop (`x = x + 1`).
Right now we assert that the RHS of the binop is a tensor,
but it really doesn't have to be because we support scalar/scalar ops and also
list-list ops (i.e., `[1, 2] + [2, 3]`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10730

Differential Revision: D9465110

Pulled By: zou3519

fbshipit-source-id: 7b118622701f09ce356aca81b8db743d9611097b
2018-08-22 15:10:33 -07:00
d40a598777 Back out "[pytorch][PR] Create at::linear" (#10785)
Summary:
Multiple failing external and internal CI signals were ignored when this commit
was landed. goldsborough please fix the text failures and resubmit this change as a
new PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10785

Reviewed By: ezyang

Differential Revision: D9466791

Pulled By: jamesr66a

fbshipit-source-id: b260e93bac95d05fd627c64e620b6aefb5045949
2018-08-22 14:39:59 -07:00
6fcac354c5 Erase ListConstruct nodes for ONNX export (#10713)
Summary:
ONNX doesn't support this. Instead flatten the inputs to the ListConstruct op and inline it into the subsequent usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10713

Differential Revision: D9458508

Pulled By: jamesr66a

fbshipit-source-id: 0b41e69320e694bb2f304c6221864a39121e4694
2018-08-22 14:39:58 -07:00
de11a5fb28 Resubmit #8322 with scipy version check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10775

Differential Revision: D9458207

Pulled By: SsnL

fbshipit-source-id: f2b0dbf2d236134afded9b15d8bf55ff98f50e7b
2018-08-22 13:39:49 -07:00
ee3e48d34b Move Backend, Layout, ATenGeneral, Deprecated, Generator to ATen/core. (#10740)
Summary:
I included "legacy" includes in the old spots for Backend, Generator, Layout; it seemed unlikely that the other ones had direct user includes.

This is another step on the path to move Type/Tensor to ATen/core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10740

Reviewed By: ezyang

Differential Revision: D9435888

Pulled By: gchanan

fbshipit-source-id: 89f4f0f445d4498a059d3a79069ba641b22bbcac
2018-08-22 13:39:46 -07:00
5ca2713a8b Fix performance of WeightedRandomSampler (#10636)
Summary:
Since https://github.com/pytorch/pytorch/pull/8958 was merged, the BatchSampler samples 0d tensors from WeightedRandomSampler instead of integers. It significantly reduces performance. This PR fix it the same way as https://github.com/pytorch/pytorch/pull/10361 fix DistributedSampler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10636

Differential Revision: D9423869

Pulled By: zou3519

fbshipit-source-id: f94da2d4cccf70e63beea6cfc3d1230b5610ae44
2018-08-22 13:15:48 -07:00
0e30fa6f3c Faster random number generation in fused_rowwise_random_quantization_ops (#10634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10634

```
Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=2, random_=True, data_shape_=array([1024, 1224]), gc=, dc=[, device_type: 1])
Sub+Scale+Sum time: 1.9944190979003908 ms
Quantizing time: 2.080512046813965 ms (1.0431669296609765X)
De-quantizing time: 0.7375001907348633 ms (0.36978195380863577X)
```

```
Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=1, random_=True, data_shape_=array([1024, 1224]), gc=device_type: 1, dc=[, device_type: 1])
Sub+Scale+Sum time: 1.6691923141479492 ms
Quantizing time: 7.500243186950684 ms (4.493336761366071X)
De-quantizing time: 1.1209726333618164 ms (0.6715658967876477X)
```

Reviewed By: jspark1105

Differential Revision: D8849770

fbshipit-source-id: 2bb2bac7e633f647f38e419ce980b8958f3bcae2
2018-08-22 13:15:46 -07:00
754ec9e386 Reduce rocm link time with ThinLTO
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10758

Differential Revision: D9467554

Pulled By: bddppq

fbshipit-source-id: 6853ccd96ac3209e062c110913ea37d6840c8134
2018-08-22 13:15:45 -07:00
9767951ca8 Remove regex matching from undefined_tensor_test, fixes #10013 (#10702)
Summary:
Don't regex against strings that may have come from the backtrace.
Better to just not regex at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10702

Reviewed By: ezyang

Differential Revision: D9406154

Pulled By: jsrmath

fbshipit-source-id: 9b17abee2a6e737a32c05f1e3963aef4b6638a47
2018-08-22 12:39:57 -07:00
b0ad8105d2 Split storage from tensor (#10053)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053

Tensor in Pytorch 1.0 will have
Tensor -> TensorImpl -> Storage -> StorageImpl
In this diff we split Storage from Tensor in order to align with this design.
We'll have Tensor -> Storage -> StorageImpl after this diff

Reviewed By: ezyang

Differential Revision: D9384781

fbshipit-source-id: 40ded2437715a3a2cc888ef28cbca9a25b1d5350
2018-08-22 11:55:02 -07:00
5fb9b31ed5 Add matrix_rank (#10338)
Summary:
- Similar functionality as NumPy
- Added doc string
- Added tests

Differential Revision: D9240850

Pulled By: SsnL

fbshipit-source-id: 1d04cfadb076e99e03bdf699bc41b8fac06831bf
2018-08-22 09:58:38 -07:00
fbd7189949 add explicit flag to build static libtorch (#10754)
Summary:
I've tested locally that this works to build static and non-static binaries with and without CUDA.

In terms of ongoing testing, I am working on incorporating this into the release package generation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10754

Differential Revision: D9457423

Pulled By: anderspapitto

fbshipit-source-id: aa1dcb17c67c0f0c493a9cf93aca4a6e06b21666
2018-08-22 09:26:07 -07:00
227635142f Delete THD master_worker (#10731)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731

Differential Revision: D9423675

Pulled By: ezyang

fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc
2018-08-22 08:54:36 -07:00
2fe5fa78fa Use FinishDeviceComputation instead of adding events in Operator::SyncDevice
Summary: The code in Operator::SyncDevice had some duplicate logic and using FinishDeviceComputation sufficed in this case.

Reviewed By: yinghai

Differential Revision: D9348288

fbshipit-source-id: d8d874bab491e6d448fcd5fa561a8b99d502753b
2018-08-22 01:09:53 -07:00
22446a3619 Productionize CRF layer in PyText (#10362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10362

This diff implements a manual export from PyText's CRF module to the caffe2 CRF layer.
Note that most of the changes in caffe2/python/crf.py are just formatting changes, the only relevant change is the new class CRFUtils.

Reviewed By: hikushalhere

Differential Revision: D9234126

fbshipit-source-id: 1a67d709034660e8b3d5ac840560b56de63e3f69
2018-08-22 00:25:26 -07:00
19031c68dc Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488)
Summary:
```
Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage

This patch does two major changes:

- It replaces the use of Retainable in Storage with a new implementation
  based on intrusive_ptr.  This will be necessary because Caffe2 will
  be using this class to implement intrusive_ptrs, and we need to
  line these up for the merge.  One good thing about the new implementation is
  that the default copy/move constructors/assignment operators and destructor
  work automatically, instead of needing to be hardcoded into Storage/Tensor.

- It replaces all places where we returned std::unique_ptr<Storage> with
  Storage, collapsing an unnecessary double indirection that is no longer
  necessary now that we have correctly working copy/move constructors.

I didn't initially want to do step (2), but it was very important to
eliminate all bare uses of new Storage and new StorageImpl, and this making
the API change was the most straightforward way to do this.

HOW TO FIX YOUR CODE IN THE NEW API

- You no longer need to dereference the result of tensor.storage() to pass
  it to set.  So, instead of:

      x.set_(*y.storage());

  just write:

      x.set_(y.storage());

- If you were accessing methods on StorageImpl via the pImpl() method, you
  must use the dot operator to run pImpl().  Even better; just drop pImpl,
  we now have method forwarding.  So, instead of:

      storage->pImpl()->data();

  just do:

      storage->data();
      // storage.pImpl()->data() works too but is not as recommended

- storage->getDevice() is no more; instead use storage->device().index()

MISC CODE UPDATES

- retain, release, weak_retain, weak_release and weak_lock are now
  reimplemented using the "blessed API", and renamed to make it
  clearer that their use is discouraged.

- nvcc OS X and general OS X portability improvements to intrusive_ptr

- A new comment in intrusive_ptr describing how stack allocated
  intrusive_ptr_targets work differently than heap allocated ones
  from c10::make_intrusive

CAVEAT EMPTOR

- THStorage_weakRetain used to work on strong pointers, but it NO LONGER
  works with intrusive_ptr.  You must reclaim the strong pointer into a
  real strong pointer, construct a weak pointer from it, and then release
  the strong and weak pointers.  See StorageSharing.cpp for an example.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488

Reviewed By: gchanan

Differential Revision: D9306134

Pulled By: ezyang

fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2
2018-08-21 21:39:55 -07:00
abb209ef25 Fixes *fft docs (#10760)
Summary:
cc cranmer

fixes #10751
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10760

Differential Revision: D9444473

Pulled By: SsnL

fbshipit-source-id: a4036773a93981801c1283d69f86e30cb0fe3d6d
2018-08-21 21:09:04 -07:00
e5e2514f4e fix debug_info arg in createOperator and improve reroute_tensor (#10736)
Summary:
-Fixed C2 core.CreateOperator debug info assignment
-Improving core.Net.reroute_tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10736

Differential Revision: D9426659

Pulled By: harouwu

fbshipit-source-id: 90caf848c88854e17e568d5f6910dc6c81fd000a
2018-08-21 19:40:16 -07:00
1068ba667c Create at::linear (#10755)
Summary:
The optimized code for `linear()` which uses `addmm` when a bias is given was duplicated three times in the ATen and the C++ API. Let's just have `at::linear` and use that everywhere.

apaszke ezyang (who mentioned this in #10481)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10755

Differential Revision: D9443881

Pulled By: goldsborough

fbshipit-source-id: a64862d1649b5961043d58401625ec267d97d9f3
2018-08-21 19:40:15 -07:00
a2ca634e04 Add enforce back to converter.cc
Summary: hotfix for B*8

Differential Revision: D9444060

fbshipit-source-id: 368f8463e684c39ec0ac18bcb11a7b6132d9f874
2018-08-21 19:09:22 -07:00
ddf187c198 Dont assume serialized integral types were widened to int32 in raw_data (#10718)
Summary:
zdevito et al came to the conclusion that the ONNX spec does not mandate the widening conversion of integral types when serializing tensor data into raw_data, as opposed to serializing the data into int32_data. PyTorch recently made this change in the export code, which caused import in caffe2 to break because it did not match semantics. This fixes that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10718

Differential Revision: D9423712

Pulled By: jamesr66a

fbshipit-source-id: 479fbae67b028bf4f9c1ca1812c2c7b0c6cccd12
2018-08-21 18:41:31 -07:00
6325e5aa48 fix typo in error message (#9827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9827

changed unitilized to uninitialized

Reviewed By: jerryzh168

Differential Revision: D8995509

fbshipit-source-id: 94518d5542a7bff49fcb9a4505c0c7a959746f78
2018-08-21 18:41:29 -07:00
44f996f82c Py3 fixes for layer_model_helper.py (#10525)
Summary:
Fixes `__getattr__` to adhere to its Python API contract, and wraps `range()` call in a list since it does not return one anymore in Python 3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10525

Reviewed By: ezyang

Differential Revision: D9441360

Pulled By: tomdz

fbshipit-source-id: d489c0e7cefecc4699ca866fd55ddbfa629688d4
2018-08-21 18:41:28 -07:00
71ddd837d7 Support custom ops in ScriptModule and tidy up test files (#10610)
Summary:
This PR adds support for using custom ops in ScriptModules, the last step for our custom op strategy. You can now write

```
import torch

torch.ops.load_library('libcustom_ops.so')

class Model(torch.jit.ScriptModule):
    def __init__(self):
        super(Model, self).__init__()

    torch.jit.script_method
    def forward(self, input):
        return torch.ops.custom.op(input) + 1

model = Model()
model.forward(torch.ones(5)) # Works
model.save("model.pt") # Works
model = torch.jit.load("model.pt") # Works
```

You can then load the `model.pt` in C++ and execute its `forward` method!

Missing for this was the fact that the script compiler didn't know to convert `ops.custom.op` into a `BuiltinFunction` which then emits a function call. For this I came up with  the following strategy inside `torch/csrc/jit/scrip/init.cpp`:

1. When we access `torch.ops`, we return a `CustomOpValue` (subclass of `PythonValue`), whose purpose is only to return a `CustomOpNamespaceValue` (subclass of `PythonValue`) whenever something under it is accessed.
2. `CustomOpNamespaceValue` will then for each field accessed on it return a `BuiltinFunction`.

This doesn't reduce performance for any calls that are not to `torch.ops` (as opposed to inspecting every function call's name the call site, for example).

I also had to fix `BuiltinFunction` to not assume the namespace is always `aten::`.

A lot of other changes are just tidying up the Python and C++ test harness before I integrate it in CI.

zdevito dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10610

Differential Revision: D9387832

Pulled By: goldsborough

fbshipit-source-id: c00f431db56c7502a66fe1f813fe78067f428ecb
2018-08-21 18:41:27 -07:00
e94ae99d24 Delete copy constructor/assignment of class Observable explicitly. (#10593)
Summary:
This should resolves "error C2280: 'std::unique_ptr<caffe2::ObserverBase<caffe2::OperatorBase>,std::default_delete<_Ty>> &std::unique_ptr<_Ty,std::default_delete<_Ty>>::operator =(const std::unique_ptr<_Ty,std::default_delete<_Ty>> &)': attempting to reference a deleted function" from Visual Studio.
It should also make error message more human-readable in case if something really messed up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10593

Reviewed By: orionr

Differential Revision: D9436397

Pulled By: mingzhe09088

fbshipit-source-id: 31711667297b4160196134a34365da734db1c61d
2018-08-21 16:56:04 -07:00
04b773ab87 Support Loading to GPU (#10710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10710

Can't resume from checkpoint for workflows that use GPU.

The problem is just we didn't leverage the already-provided GPU deserialization of Caffe2.

`keep_device` arg of LoadOp. See https://fburl.com/y27ltaxw

How a serialized BlobProto (contraining TensorProto) is loaded into GPU memory?
- Load BlobProto from DB. https://fburl.com/pe1qaeyf
- Deserialize the BlobProto into a Blob instance. https://fburl.com/5dirjuuh and https://fburl.com/stoho0x1
- Call Blob->Deserialized. https://fburl.com/bnureu32
- Deserializer Registration. https://fburl.com/wbu95ry7 https://fburl.com/ycetud8u
- Create TensorCUDA Deserializer. https://fburl.com/2lirfuqj
- Create Tensor on GPU and get TensorProto of BlobProto. https://fburl.com/7dre82zg
- Copy TensorProto in CPU to Tensor on GPU. https://fburl.com/fr0qk2oe

Cloned the GPU workflows for testing in D9125520.

Reviewed By: mraway

Differential Revision: D9372950

fbshipit-source-id: 2bf70747bd71e8da16239197f7d2761d63f09ff8
2018-08-21 13:57:36 -07:00
edb34434ab More changes for hidden visibility (#10692)
Summary:
Let's run CI tests to see what fails given the changes that just landed in https://github.com/pytorch/pytorch/pull/10624

cc mingzhe09088 ezyang Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10692

Reviewed By: mingzhe09088

Differential Revision: D9423617

Pulled By: orionr

fbshipit-source-id: 3bda1f118d13f8dd8e823727c93167cae747d8cf
2018-08-21 13:39:57 -07:00
8a1739b05d Add arguments __repr__ in Distribution base class
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10373

Differential Revision: D9240316

Pulled By: ezyang

fbshipit-source-id: f35c500f61f86e6be405e8bd4040db5146224984
2018-08-21 12:10:23 -07:00
9c321a8779 Add util function from core type to dtype (#10716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10716

title

Reviewed By: idning

Differential Revision: D9417357

fbshipit-source-id: 0f71805b1d64a46791d6ee4d8620763f878ffdb6
2018-08-21 10:55:19 -07:00
b23d59ce1a Make ONNX_ATEN_FALLBACK as internal default option
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10629

Reviewed By: bddppq

Differential Revision: D9381106

fbshipit-source-id: 03d42c95d17a70a68fe0f38dad68f1793996dfce
2018-08-21 10:10:50 -07:00
b0b5139149 Set the BUILD_ENVIRONMENT variable before installing sccache. (#10640)
Summary:
Set the build environment before installing sccache in order to make sure the docker images have the links set up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10640

Reviewed By: yf225

Differential Revision: D9399593

Pulled By: Jorghi12

fbshipit-source-id: a062fed8b7e83460fe9d50a7a27c0f20bcd766c4
2018-08-21 09:40:41 -07:00
30ad13faca Avoid shadowing i, j vars in GeneralProposals test (#10721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10721

- Fix compilation warning "declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local]"

Reviewed By: newstzpz

Differential Revision: D9419688

fbshipit-source-id: 76efc3688782ce4ead3c89e7069211736febfac2
2018-08-21 09:11:38 -07:00
f9d1b001e1 Move THNN Reduction to ATen/core. (#10703)
Summary:
This is part of moving the (base) Type to ATen/core; Some Type methods have default argument of type THNN Reduction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10703

Differential Revision: D9406060

Pulled By: gchanan

fbshipit-source-id: 789bb3387c58bd083cd526a602649105274e1ef6
2018-08-21 08:54:35 -07:00
f0d8a36e70 Completely remove build_aten and use_aten (#10469)
Summary:
Breaking out of #8338 to completely remove build_aten and use_aten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10469

Reviewed By: orionr

Differential Revision: D9413639

Pulled By: mingzhe09088

fbshipit-source-id: b7203aa4f5f2bb95c504c8dc187a3167f2570183
2018-08-20 20:26:42 -07:00
9e75ec11fb Make empty list literals construct empty Tensor[] (#10705)
Summary:
This will make the common case more natural (no need to do `_construct_empty_tensor_list()`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10705

Differential Revision: D9411622

Pulled By: michaelsuo

fbshipit-source-id: 2d91fbc5787426748d6e1c8e7bbeee737544dc96
2018-08-20 18:28:28 -07:00
5c0d9a2493 Soumith's last few patches to v0.4.1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10646

Reviewed By: ml7

Differential Revision: D9400556

Pulled By: pjh5

fbshipit-source-id: 1c9d54d5306f93d103fa1b172fa189fb68e32490
2018-08-20 18:28:27 -07:00
e449a27646 Fix issues link in Caffe2 readme (#10711)
Summary:
Change to pytorch issues link

orionr pjh5 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10711

Reviewed By: orionr

Differential Revision: D9412870

Pulled By: duc0

fbshipit-source-id: 341e8504ade8eba614cead832e5b5fdca4b1c270
2018-08-20 16:55:11 -07:00
826550a32e Update the onnx Gemm op to FC/FCTransposed logic in caffe2 onnx backend (#10108)
Summary:
The broadcast is used by default when the opset version is greater then 6.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10108

Reviewed By: bddppq

Differential Revision: D9176934

Pulled By: houseroad

fbshipit-source-id: b737bd87b0ddc241c657d35856d1273c9950eeba
2018-08-20 16:09:22 -07:00
15d7f49205 Adding ATEN_NO_TEST option to root level cmake for propogation to aten
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10708

Reviewed By: ml7

Differential Revision: D9410916

Pulled By: pjh5

fbshipit-source-id: b216a9ff7be23ff8754f2fe0b8197b5d006aa08d
2018-08-20 15:40:27 -07:00
585e6b581f Allow method-style casts on tensors (#10641)
Summary:
Closes https://github.com/pytorch/pytorch/issues/10631
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10641

Differential Revision: D9407598

Pulled By: jamesr66a

fbshipit-source-id: a0331f4e9e55d92718cde7a1112fe8c705206b1f
2018-08-20 14:10:21 -07:00
39a3dcc999 Fix #10698 build failure (#10704)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10704

Differential Revision: D9406072

Pulled By: ezyang

fbshipit-source-id: 0d472ef84cddc3bf7600b06d04e5e02e94d59fa3
2018-08-20 14:10:19 -07:00
b4684db698 Add support for Log()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10694

Reviewed By: houseroad

Differential Revision: D9405612

Pulled By: MisterTea

fbshipit-source-id: 6d83d3c2db933a3822076c7faf578ac0e92e60c6
2018-08-20 13:25:21 -07:00
7832e9d564 Add a bisect percentile operator (#10563)
Summary:
Add a bisect percentile operators with lower and upper bounds for interpolation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10563

Reviewed By: chocjy

Differential Revision: D7802182

Pulled By: olittle

fbshipit-source-id: 89ebfa8b3463adc2c89235fa3dfffa187a9d5417
2018-08-20 13:14:05 -07:00
3d0757430b Fix EnsureCPUOutputOp (#10651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10651

EnsureCPUOutputOp will copy the input from another Context to CPU, but currently there is no guarantee that the Copy will be executed.

Differential Revision: D9390046

fbshipit-source-id: af3ff19cf46560264cb77d2ab8821f0cc5be74f6
2018-08-20 12:12:48 -07:00
2e563c417c Nomnigraph - rename some APIs that invole Subtree to Subgraph (#10551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10551

Renaming from "subtree" -> "subgraph" to improve clarity of subgraph matcher APIs since it now supports DAG

This is pure renaming, no functionalities change.

Reviewed By: bwasti

Differential Revision: D9348311

fbshipit-source-id: 4b9267845950f3029dfe385ce3257d3abb8bdad4
2018-08-20 10:55:21 -07:00
aa9f328fa3 Nomnigraph - DAG matching (#10549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10549

Support dag matching in nomnigraph. This is done by maintaining a map from node in the MatchGraph to node in the input graph, and additionally enforce that same nodes in the MatchGraph must match to same nodes in the input graph (with the exception of multiplicity i.e. when count != 1 on the MatchGraph node).

In a follow up diff, I'll rename the API that refers to subtree as subgraph to improve clarity.

Reviewed By: bwasti

Differential Revision: D9347322

fbshipit-source-id: 171491b98c76852240a253279c2654e96dd12632
2018-08-20 10:55:19 -07:00
0cce4620fe Fix backend/device-type comparison with MKLDNN.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10689

Differential Revision: D9400450

Pulled By: gchanan

fbshipit-source-id: f75b042b886d5d525edb2c423173a9646c613a1b
2018-08-20 10:41:08 -07:00
db7b7f1359 fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10686

Differential Revision: D9399874

Pulled By: SsnL

fbshipit-source-id: 28130992d2416721552f72cfa835ff0358caeefa
2018-08-20 10:40:55 -07:00
d4832f1e7b More fixes for hidden visibility (#10624)
Summary:
Some more `ATEN_API` additions for hidden visibility.

Running CI tests to see what fails to link.

cc Yangqing mingzhe09088 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10624

Reviewed By: mingzhe09088

Differential Revision: D9392728

Pulled By: orionr

fbshipit-source-id: e0f0861496b12c9a4e40c10b6e0c9e0df18e8726
2018-08-20 10:11:59 -07:00
9ad9191323 Fix cuDNN dropout state cache (#10662)
Summary:
Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event.

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10662

Reviewed By: soumith

Differential Revision: D9393629

Pulled By: apaszke

fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90
2018-08-20 05:09:41 -07:00
c37fac4d50 Fixing stop condition on composite reader (#9888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9888

Limiter cannot be shared or copied; just pass it to the first reader.

Reviewed By: xianjiec

Differential Revision: D9008871

fbshipit-source-id: e20cd785b26b1844e156efc3833ca77cfc3ffe82
2018-08-20 03:02:20 -07:00
83066e9b30 Add trigonometry functions for ONNX export (#7540)
Summary:
Trigonometry functions are newly added to ONNX in a recent PR https://github.com/onnx/onnx/pull/869

This PR makes pytorch support exporting graphs with trigonometry functions.

This PR might need to wait until it is ready to change
```python
_onnx_opset_version = 6
```
to
```python
_onnx_opset_version = 7
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7540

Differential Revision: D9395041

Pulled By: bddppq

fbshipit-source-id: bdf3e9d212b911c8c4eacf5a0753bb092e4748d2
2018-08-19 23:01:28 -07:00
3f603eeee8 some improvements on distributed docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10666

Differential Revision: D9395242

Pulled By: SsnL

fbshipit-source-id: 952326b9c5a1a974a1c33a0e12738e1e21ad9956
2018-08-19 17:40:28 -07:00
108b657159 Import DistributedSampler in utils/data/__init__ (#10671)
Summary:
There is no reason that user should do an extra import to use DistributedSampler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10671

Differential Revision: D9395189

Pulled By: SsnL

fbshipit-source-id: 8f41d93813c8fb52fe012f76980c6a261a8db9b2
2018-08-19 16:55:13 -07:00
6bdbad93b9 Refactor Device to not depend on Backend. (#10478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478

- Removed Backend constructor from Device, and fixed all
  use-sites to use DeviceType::CPU instead of kCPU, or
  use a new function backendToDeviceType to perform
  the conversion.
- New method device_type() on Type; it gives you the
  underlying device type, e.g., CPU for SparseCPU.
- We add backward compatibility for kCPU/kCUDA uses,
  by introducing a new special type which is implicitly
  convertible to both DeviceType and Backend.  As long as
  you don't define a function that's overloaded on both
  DeviceType and Backend (but not on BackendOrDeviceType),
  the implicit conversions will ensure that uses
  of at::Device(at::kCPU) keep working. We fixed use-sites in
  the library, but did NOT fix sites in the test code, so that
  we can exercise this BC code.

Reviewed By: Yangqing

Differential Revision: D9301861

fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716
2018-08-18 17:39:14 -07:00
f1420adfe3 Move at::chunk into the graph fuser (#10178)
Summary:
... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026

This is done through the following:

1) Absorb starting chunks into FusionGroup as a part of the graph fuser
pass.
2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked.
3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an
input tensor on the CPU. This chunk directly takes in an at::Tensor and creates
four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors.

- Expect test and correctness test to see if a single chunk is fused
  by the graph fuser
- Correctness test for a variety of chunks (dimension = beginning,
  middle, end) and tensors (contiguous, non-contiguous, edge case
  (splitSize = 1) for both CPU/CUDA
- Expect test for multiple chunks fused into the same kernel and
  correctness test.

cc zdevito apaszke

LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights.

After changes:
```
thnn    cudnn   jit
8.8468  6.5797  9.3470
```

Before changes:
```
thnn    cudnn   jit
9.9221  6.6539  11.2550
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178

Differential Revision: D9382661

Pulled By: zou3519

fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8
2018-08-18 16:10:11 -07:00
poh
d87b4e941b fix python interpreter can not be found without PYTHON_EXECUTABLE (#10659)
Summary:
Take 2 of #10543
The problem was that between commit and merge there was added one more run point `tools/build_libtorch.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10659

Differential Revision: D9393540

Pulled By: soumith

fbshipit-source-id: 8ebfed600fc735fd1cb0489b161ec80e3db062e0
2018-08-18 15:40:08 -07:00
152762a567 Fix warnings diagnosed in recent clang (#10647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10647

Fix "missing std::move from the return value" warning diagnosed by recent clang compiler.

Reviewed By: soumith, DavidCallahan

Differential Revision: D9384692

fbshipit-source-id: 8ad951e47d605e6f98a9650f2dec2909ad0f3eb8
2018-08-17 21:32:58 -07:00
e29b5a1ea8 graph fuser inserts explicit expands where necessary (#10325)
Summary:
Fixes #10096

If the only thing preventing a simple mappable operator from being fused
into a fusion group is that its Tensor inputs are not of the same shape as the
output, then the graph fuser inserts explicit expand nodes for those
inputs.
This helps the graph fuser not miss out on any fusion opportunities
involving simple mappable operations that have Tensor inputs. This PR
doesn't do anything for the scalar case; that can be addressed later.

Test Plan
- Simple expect test case
- Added expect tests for a raw LSTMCell. The expands help speed up the
  forwards pass by allowing more operations to be fused into the LSTMCell's single
  FusionGroup.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10325

Differential Revision: D9379308

Pulled By: zou3519

fbshipit-source-id: 86d2202eb97e9bb16e511667b7fe177aeaf88245
2018-08-17 16:03:46 -07:00
7c55d11ba5 Make sure we don't relocate the weight name buffer (#10630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10630

`onnxTensorDescriptorV1.name` points to the string buffer. We use a vector of strings to serve as the storage. This means we cannot reallocate the vector because that may invalidate the `onnxTensorDescriptorV1.name` pointers. Solution is to reserve a large enough vector so that it won't reallocate.

Reviewed By: bddppq, houseroad

Differential Revision: D9381838

fbshipit-source-id: f49c5719aafcc0829c79f95a2a39a175bcad7bfe
2018-08-17 16:03:31 -07:00
65b9308128 Basic infrastructure for C++ documentation (#10569)
Summary:
Adds the folder structure, Doxyfile, sphinx setup and Makefile to build C++ docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10569

Differential Revision: D9386744

Pulled By: goldsborough

fbshipit-source-id: 0a7c581dcf0a5f7b01ba19d317b493cf95935134
2018-08-17 15:39:50 -07:00
b62b378022 Adding torch support for CMAKE_ARGS env
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10635

Reviewed By: ml7

Differential Revision: D9383845

Pulled By: pjh5

fbshipit-source-id: fb21bda12e88053eec738974e6e419388c5038d9
2018-08-17 14:54:43 -07:00
c5c1c051ca Fix dropout fused kernel applied in eval mode (#10621)
Summary:
fixes https://github.com/pytorch/pytorch/issues/10584

cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10621

Differential Revision: D9379397

Pulled By: SsnL

fbshipit-source-id: 5ff2939ba794af082ce597ef289a09ee757636dc
2018-08-17 14:54:42 -07:00
86c9856d9c Fuse tensor-scalar ops when scalar is constant (#10511)
Summary:
This is on the way to resolving #9940.

Fixes #10501

This PR modifies graph fuser to fuse operations that have constant
scalar arguments. These constant scalar arguments are directly inlined
into the kernel body.

The context for this is that LSTM backward (in particular, sigmoid
backward) has many add(x, 1.) operations. This PR should be sufficient for
LSTM backward to get fused by the graph fuser.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10511

Differential Revision: D9378896

Pulled By: zou3519

fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f
2018-08-17 14:10:23 -07:00
f3ac619764 Add fusion support for batchnorm and convolution without bias
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10595

Reviewed By: bwasti

Differential Revision: D9110099

fbshipit-source-id: e1ed66c7d82b2f9987b7eb9c7f98877a6dbeb902
2018-08-17 12:11:44 -07:00
d35f365ad5 Remove all cuDNN specific inputs to RNN functions (#10581)
Summary:
This is still not the final PR, but it removes all blockers for actually using the RNN functions directly in the JIT. Next patch should be final, and will actually remove the symbolic_override code, and change it to proper symbolics for those ATen functions. Turns out the symbolic code can be also cleaned up a bit, and I'll do that too.

zdevito ezyang
colesbury (for minor DispatchStub.h) changes

There was no way to handle those in the JIT for now, and they turned
out to be completely unnecessary. It should make the Python and C++
module code much simpler too, since all the logic is now centralized
in the native functions.

The downside is that RNN modules no longer own their dropout buffers,
which are shared per-device instead (with appropriate locking and
synchronization). This might appear as a perf regression at first, but
in reality it's highly unlikely that anyone will want to run cuDNN RNNs
on the same GPU in parallel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10581

Reviewed By: colesbury

Differential Revision: D9365541

Pulled By: apaszke

fbshipit-source-id: 3ef8677ee5481bae60c74a9117a2508665b476b5
2018-08-17 11:09:51 -07:00
52058204d6 Add nn functional tests in JIT (#10409)
Summary:
The PR is the first step to integrate torch.nn library with JIT. It adds the tests for nn functional interfaces in trace/script mode, and tries to find out the different between torch.nn.functional ops and the ATen ops, to see the work need to be done in order to support a full set of nn functional in script mode.

Some statistics in summary:

- Totally 84 useful functions in torch.nn.functional (the number does not include helper funcs and deprecated funcs in torch.nn.functional).

- 7 functions/ops does not support higher gradient, so just excluded from the whole test.

- 36 functions is different with the Aten op for different reasons. Among those 36 functions, bunch of them (roughly around 10-15) are just naming difference and simple transformation using other ops inside the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10409

Differential Revision: D9350694

Pulled By: wanchaol

fbshipit-source-id: 8fce6f30d8d25ace5a544a57b219fe61f5a092f8
2018-08-17 11:09:49 -07:00
b4e72ea811 Revert D9377394: [pytorch][PR] [Caffe2] Add AT_CORE_EXPORT and AT_CORE_IMPORT.
Differential Revision:
D9377394

Original commit changeset: 993062a461ff

fbshipit-source-id: af8ab92e9b88466602508981d9b3ea24ce393dfc
2018-08-17 10:39:27 -07:00
bd9ab650ae fix compile error in math_hip.cc from new Im2Col/Col2Im interface (#10623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10623

Fix compile error in https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-build/10280//console

Reviewed By: ezyang

Differential Revision: D9379451

fbshipit-source-id: 67cc3964981edba1915b93c49643caa300d63c16
2018-08-17 10:24:25 -07:00
ff440b61f6 Revert D9378844: [pytorch][PR] fix python interpreter can not be found
Differential Revision:
D9378844

Original commit changeset: 022e20aab7e2

fbshipit-source-id: 962280707e84edff2a4f59b1ce2f4211a579a055
2018-08-17 10:09:27 -07:00
e190505e84 Adding support for inlining if branches (#10084)
Summary:
Inlining if branches which have constant inputs.  If an if node gets inlined, the set of mutated variables returned by its ancestors may have changed. In the following example the block should
return a mutated set of (a) and not (a, b).

```
if cond:
  if True:
	 a = a - 1
    else:
	b = b - 1
```
To calculate this we recursively update mutate variables in if branches from the leaf nodes up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10084

Reviewed By: michaelsuo

Differential Revision: D9340429

Pulled By: eellison

fbshipit-source-id: b0dd638a5cace9fdec3130460428fca655ce4b98
2018-08-17 09:48:47 -07:00
31c7a32d1c Include aten_op by default in caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10603

Reviewed By: ahhegazy, dzhulgakov

Differential Revision: D9364309

fbshipit-source-id: e72d9f2b1e99cb0fb2186c737fcd925b14d42754
2018-08-17 08:39:46 -07:00
03982fb8d3 Fix subgraph cutting wrt recent external_input change in nomnigraph (#10598)
Summary:
https://github.com/pytorch/pytorch/pull/10100 recently take external input/output in nomnigraph. This PR makes adjust to
0. Relax some of the conditions on external input
1. Update NNModule inputs/outputs when pruning the input/output.
2. Avoiding copying external input/output as nomnigraph already takes care of it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10598

Reviewed By: bwasti

Differential Revision: D9371730

Pulled By: yinghai

fbshipit-source-id: 9273be5041dc4cc8585587f47cb6721e518a06a8
2018-08-17 08:25:49 -07:00
ff3a481aee fix python interpreter can not be found (#10543)
Summary:
Custom python installation, which have no aliases to `python` or `python3` can't be found by cmake `findPythonInterp` without extra cmake argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10543

Differential Revision: D9378844

Pulled By: ezyang

fbshipit-source-id: 022e20aab7e27a5a56b8eb91b6026151116193c7
2018-08-17 08:25:48 -07:00
51222500e2 Add AT_CORE_EXPORT and AT_CORE_IMPORT. (#10602)
Summary:
Fix "error LNK2019: unresolved external symbol" from "CAFFE_KNOWN_TYPE" in tests where we should use dllexport instead of AT_CORE_API(=dllimport).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10602

Differential Revision: D9377394

Pulled By: Yangqing

fbshipit-source-id: 993062a461ffce393f2321c5391db5afb9b4e7ba
2018-08-17 02:09:38 -07:00
cc53807be5 group conv with NHWC layout (#10585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10585

group conv with NHWC layout

Reviewed By: BIT-silence

Differential Revision: D7547497

fbshipit-source-id: da0ec5a4512c15a0a0d7b79e6ce00c1f8f77f661
2018-08-17 00:39:23 -07:00
0aefb9f26c Update onnx to onnx/onnx@7848f1e (#10613)
Summary:
https://github.com/onnx/onnx/commit/7848f1e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10613

Reviewed By: houseroad

Differential Revision: D9376224

Pulled By: bddppq

fbshipit-source-id: ce8a53255ba24f0f8f989570e8b015837f8442fb
2018-08-16 23:39:37 -07:00
6667d55e73 Disallow input filler for GatherRangesOp (#10592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10592

Filter out GatherRanges ops

Reviewed By: highker

Differential Revision: D9365220

fbshipit-source-id: e21ab00dc9e553c9aaf172e1241206e0c0a7a23d
2018-08-16 21:39:09 -07:00
3578909671 Remove unused code base for distributed training (#10282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10282

This diff removes the unused/deprecated features from the code base.

Reviewed By: manojkris

Differential Revision: D9169859

fbshipit-source-id: d6447b7916a7c687b44b20da868112e6720ba245
2018-08-16 20:10:17 -07:00
f1d40ef280 build_pytorch_libs.sh: use MAX_JOBS rather than NUM_JOBS (#10600)
Summary:
MAX_JOBS is set by our jenkins setup
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10600

Differential Revision: D9375317

Pulled By: anderspapitto

fbshipit-source-id: 25416d5ee12372f7610baa78cb7b423806b26aa2
2018-08-16 20:10:15 -07:00
c101a57a74 Build mechanism for custom operators (#10226)
Summary:
This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I:

1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries
2. Created a ` torch/op.h` header for easy inclusion of necessary headers,
3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op.
    1. It defines an op in `op.{h,cpp}`
    2. Registers it with the JIT using `RegisterOperators`
    3. Builds it into a shared library via a `CMakeLists.txt`
    4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey!

The pure C++ and the Python builds are separate and not coupled in any way.

zdevito soumith dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226

Differential Revision: D9296839

Pulled By: goldsborough

fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0
2018-08-16 18:56:17 -07:00
67c6d93634 Tune minimal work size (#10599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10599

Not spawning threads with spin-lock synchronization is bad because they will switch to `condvar` wait, which increases wake-up latency next time they are needed.

Reviewed By: ajtulloch

Differential Revision: D9366664

fbshipit-source-id: 3b9e4a502aeefaf0ddc4795303a855d98980b02e
2018-08-16 17:39:57 -07:00
afd7477eaa Add `buffers(), named_buffers()` methods. (#10554)
Summary:
This commit adds the ``buffers()`` and ``named_buffers()`` methods as
analogues of ``parameters()`` and ``named_parameters()``.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554

Reviewed By: SsnL

Differential Revision: D9367762

Pulled By: jma127

fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e
2018-08-16 16:26:48 -07:00
342517e6e7 Back out "Add aten_op to caffe2 onnx (python) backend" (#10589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10589

Original commit changeset: 2cc6fedbaf08

Reviewed By: houseroad

Differential Revision: D9365208

fbshipit-source-id: 3871d8e70f0d8e48c8af9593c78587d16c45afc2
2018-08-16 15:15:27 -07:00
488ea824ed Additional changes to make GPU builds work (#10507)
Summary:
A continuation of https://github.com/pytorch/pytorch/pull/10504 for GPU, torch, etc. builds.

I was testing with

```
FULL_CAFFE2=1 python setup.py build_deps | tee ~/log.txt
cat ~/log.txt | egrep 'undefined refer' | sort | less
```

I'll rebase on master when Yangqing's changes in 10504 land, but putting up for some testing.

cc mingzhe09088 anderspapitto ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10507

Reviewed By: Yangqing

Differential Revision: D9359606

Pulled By: orionr

fbshipit-source-id: c2a3683b3ea5839689f5d2661da0bc9055a54cd2
2018-08-16 13:25:27 -07:00
ef15bb8787 remove implicit conversion from gpu to cpu (#10553)
Summary:
Resubmit #10416 with fixed tests . This is to remove implicit conversion from gpu to cpu in when calling numpy to keep behavior match others.

It requires users to move the tensor back to cpu() before call numpy functions on it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10553

Differential Revision: D9350212

Pulled By: ailzhang

fbshipit-source-id: 9317d8fea925d4b20ae3150e2c1b39ba5c9c9d0a
2018-08-16 12:10:39 -07:00
d6f3c88418 Revert D9076734: Split storage from tensor
Differential Revision:
D9076734

Original commit changeset: ea9e1094ecf8

fbshipit-source-id: 3fa9b65b7265fce6207d9e1d9ef4707dbb29704b
2018-08-16 11:25:32 -07:00
40a070422d Adding new allreduce bcube routines to ops supported by gloo (#10494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10494

Adding the AllredubeBcube routines as they are now available in gloo.

Reviewed By: wesolwsk

Differential Revision: D8269473

fbshipit-source-id: 6a3a32291bbf1fbb328b3ced0f2a753dc5caf4e5
2018-08-16 10:56:26 -07:00
4be4b4c8b5 Remove weight from input of onnxifi backend op (#10575)
Summary:
The ONNXIFI backend will absorb the constant weight in Conv, so we should not add it as an input. This is just a test artifacts. Note that Onnxifi transformer will do the right thing when cutting the graph to absorb the weights.

rdzhabarov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10575

Reviewed By: houseroad

Differential Revision: D9357339

Pulled By: yinghai

fbshipit-source-id: a613fa3acafa687295312f5211f8e9d7f77b39cd
2018-08-16 10:56:25 -07:00
319fefe9e6 Support benchmark on windows machines
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10564

Reviewed By: llyfacebook

Differential Revision: D9356389

Pulled By: sf-wind

fbshipit-source-id: f6c58e68d3eaf3a39c9f89b8f04e6039c75b4cd9
2018-08-16 10:56:23 -07:00
00f2731112 Merge THTensor into TensorImpl
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10479

Differential Revision: D9315800

Pulled By: gchanan

fbshipit-source-id: b13ef0de3342600b02b54e0700eb02021a9d1a9e
2018-08-16 08:10:06 -07:00
130881f0e3 Delete build_caffe2.sh, replace with build_libtorch.py (#10508)
Summary:
delete build_caffe2.sh, replace with build_libtorch.py as suggested by peter (and copy-pasted from his draft PR).  This ensures that all consumers of the torch CMake file go through as unified a path as possible.

In order to change the surrounding infrastructure as little as possible, I made some tweaks to enable build_pytorch_libs.sh to generate the test binaries relative to the current directory, rather than hardcoding to pytorch/build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10508

Differential Revision: D9354398

Pulled By: anderspapitto

fbshipit-source-id: 05b03df087935f88fca7ccefc676af477ad2d1e9
2018-08-16 08:10:04 -07:00
c6facc2aaa Add conversions between DataType and ScalarType.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10472

Reviewed By: gchanan

Differential Revision: D9298048

fbshipit-source-id: c58efa582eab64c58d0771d90d90862911c168d1
2018-08-16 07:55:31 -07:00
fdd2b9baee Add DataType alias
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10547

Reviewed By: soumith

Differential Revision: D9346040

fbshipit-source-id: 1069a44182ccff68b1694086c8b709ba2046b22b
2018-08-16 07:55:29 -07:00
8fdba4ec35 Move all operator<< overloads out of the global namespace. (#10546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10546

Have you ever written an operator<< overload in the caffe2 namespace
in a core Caffe2 header, and then been stunned when some completely
unrelated code started breaking?  This diff fixes this problem!

The problem looks like this:
1. You're building against a really old version of glog (think 0.3.2,
   or something like that)
2. This version of glog defines operator<< overloads for std containers
   in the global namespace
3. You add a new overload in your current namespace (e.g., caffe2).
   Congratulations: this overload is *preferentially* chosen over
   the global namespace one for all calls to << in that namespace.
   And since it doesn't actually have std::vector overloads, unrelated
   Caffe2 code breaks.

Newer versions of glog have a fix for this: they have the line:

  namespace std { using ::operator<<; }

in their header.  So let's help old versions of glog out and do this ourselves.

In our new world order, operator<< overloads defined in the global namespace
won't work (unless they're for std containers, which work because of ADL).
So this diff also moves all those overloads to the correct namespace.

Reviewed By: dzhulgakov

Differential Revision: D9344540

fbshipit-source-id: 6246ed50b86312668ebbd7b039fcd1233a3609cf
2018-08-16 07:55:27 -07:00
238b4b9236 Resolve error C2370 "redefinition; different storage class" by adding dllimport. (#10571)
Summary:
For #10568
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10571

Differential Revision: D9357987

Pulled By: Yangqing

fbshipit-source-id: 6726f0a1d31a225375a0ddc0e05284f3eb89dda8
2018-08-16 00:39:33 -07:00
84427d26db Add aten_op to caffe2 onnx (python) backend
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10579

Reviewed By: houseroad

Differential Revision: D9357837

fbshipit-source-id: 2cc6fedbaf088df7e11b52a91dfe3b8f0d7fd599
2018-08-16 00:39:30 -07:00
76da0b34c2 Remove an unused variable found by linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10578

Differential Revision: D9357880

Pulled By: bddppq

fbshipit-source-id: 6b56c2dbd02258124b5a4656cdf44d14a59e1b71
2018-08-16 00:25:44 -07:00
7487ee55f1 Resolving error C2487 "member of dll interface class may not be declared with dll interface" by removing nested CAFFE2_API. (#10572)
Summary:
For #10570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10572

Differential Revision: D9357984

Pulled By: Yangqing

fbshipit-source-id: a8f74e384eb3219fb6ac71ada4a45e6bce9199eb
2018-08-16 00:25:41 -07:00
abf85bf0ef Perform CSE across block boundaries. (#10105)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10105

Differential Revision: D9186678

Pulled By: resistor

fbshipit-source-id: 87b63d4fc0c7d394edb4777acdefa8f022a8bf8d
2018-08-16 00:25:36 -07:00
2e0dd86903 Make torch::Tensor -> at::Tensor (#10516)
Summary:
This PR removes the `using Tensor = autograd::Variable;` alias from `torch/tensor.h`, which means `torch::Tensor` is now `at::Tensor`. This PR fixes up some last uses of `.data()` and tidies up the resulting code. For example, I was able to remove `TensorListView` such that code like

```
auto loss = torch::stack(torch::TensorListView(policy_loss)).sum() +
    torch::stack(torch::TensorListView(value_loss)).sum();
```

is now

```
auto loss = torch::stack(policy_loss).sum() + torch::stack(value_loss).sum();
```

CC jgehring

ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10516

Differential Revision: D9324691

Pulled By: goldsborough

fbshipit-source-id: a7c1cb779c9c829f89cea55f07ac539b00c78449
2018-08-15 21:25:12 -07:00
8013dac43d Fix bincount for empty input (#9757)
Summary:
Added tests too. Fixes #9756 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757

Reviewed By: Yangqing

Differential Revision: D9348485

Pulled By: soumith

fbshipit-source-id: e13afadf8dbea20ee6ee595383c522dcbaf8796a
2018-08-15 20:55:59 -07:00
05dcf00644 fixed c10d test (#10557)
Summary:
fixed NCCL test, which is not run in CI. We should enable it soon.
```
~/new_pytorch/pytorch/test$ python test_c10d.py
...............
----------------------------------------------------------------------
Ran 15 tests in 13.099s

OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10557

Reviewed By: ailzhang

Differential Revision: D9353286

Pulled By: teng-li

fbshipit-source-id: 5a722975beaa601203f51c723522cc881f2d2090
2018-08-15 17:22:38 -07:00
0a809fc8b1 build changes to make cpu unified build working. (#10504)
Summary:
Properly annotated all apis for cpu front. Checked with cmake using

cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON

and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504

Reviewed By: ezyang

Differential Revision: D9316491

Pulled By: Yangqing

fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
2018-08-15 17:22:36 -07:00
87cac4c2f1 Update Im2Col related to make preparation for group conv in NHWC order. (#10439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439

Update Im2Col related to make preparation for group conv in NHWC order.

Reviewed By: houseroad

Differential Revision: D9285344

fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d
2018-08-15 17:10:24 -07:00
579962f2a8 reroute tensor feature in core.Net and generate one net feature in model_helper (#10528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10528

adding 2 features to core and model_helper

- reroute_tensor which supports op insertion on net level
- model_helper complete net and cut net used for full graph analysis

Differential Revision: D9330345

fbshipit-source-id: 56341d3f500e72069ee306e20266c8590ae7985a
2018-08-15 16:40:15 -07:00
523bdc8ec1 Split storage from tensor (#10053)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053

Tensor in Pytorch 1.0 will have
Tensor -> TensorImpl -> Storage -> StorageImpl
In this diff we split Storage from Tensor in order to align with this design.
We'll have Tensor -> Storage -> StorageImpl after this diff

Reviewed By: dzhulgakov

Differential Revision: D9076734

fbshipit-source-id: ea9e1094ecf8c6eaeaa642413c56c6a95fb3d14e
2018-08-15 16:40:14 -07:00
03e9ea5ef0 Fix leaking of Storages (not StorageImpls) (#10552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10552

Fix leaking of Storages (not StorageImpls)

Reviewed By: li-roy

Differential Revision: D9349824

fbshipit-source-id: 31f14951020a63189bebda25a3bf8bf195cd227f
2018-08-15 16:10:00 -07:00
4c49da34a9 Add new MKLDNN fallback operators (#10526)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10526

Resubmitting these changes. Previously they caused issues with multifeed, which I fixed with D9280622

Reviewed By: yinghai

Differential Revision: D9327323

fbshipit-source-id: ec69428039b45c6221a5403b8fe9a83637857f04
2018-08-15 15:55:22 -07:00
a129f9ad3b Revert D9332335: [pytorch][PR] Implements volumetric (5d) affine grid generation.
Differential Revision:
D9332335

Original commit changeset: 1b3a91d078ef

fbshipit-source-id: 3dcce680257a6da121f5d67918ed4236e0c5bfec
2018-08-15 15:25:11 -07:00
151e7de893 varargs for einsum (#10067)
Summary:
Implemented via a wrapper, thank you Richard for the suggestion!

Fixes: #9929
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10067

Differential Revision: D9083388

Pulled By: soumith

fbshipit-source-id: 9ab21cd35278b01962e11d3e70781829bf4a36da
2018-08-15 15:13:25 -07:00
fb45ec5ac3 Don't set DEBUG=1 in ASAN build (#9902)
Summary:
This should make ASAN tests run faster.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9902

Differential Revision: D9032986

Pulled By: yf225

fbshipit-source-id: 3d2edec2d7ce78bc995d25865aa82ba6d3f971d0
2018-08-15 14:39:57 -07:00
26c764a1db Update FP16 submodule. Close #10523 (#10548)
Summary:
Pull a fix in FP16 for compilation bug when using Intel Compiler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10548

Differential Revision: D9349469

Pulled By: Maratyszcza

fbshipit-source-id: 43e6dc5c3c18319d31eca23426770c73795feec5
2018-08-15 14:26:56 -07:00
021b4888db Remove setup_requires and tests_require from setup.py for FULL_CAFFE2 (#10530)
Summary:
In my environment, it looks like setup.py hangs when running

```
FULL_CAFFE2=1 python setup.py build_deps
```

Removing this fixes things, but we might also want to look at `tests_require`, which came over from `setup_caffe2.py`.

cc pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10530

Differential Revision: D9349597

Pulled By: orionr

fbshipit-source-id: 589145eca507dfaf16386884ee2fbe60299660b4
2018-08-15 14:26:53 -07:00
c5b1aa93ee Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316

Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor.

Reviewed By: harouwu

Differential Revision: D9004839

fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845
2018-08-15 14:26:50 -07:00
6f14202acd Revert D9276252: [pytorch][PR] remove implicit conversion to cpu
Differential Revision:
D9276252

Original commit changeset: ea7d9d4f9390

fbshipit-source-id: 5977bf90d4c84b47e15bc8266cc3ce5602c4e05f
2018-08-15 13:55:18 -07:00
5adcac3dce Cuda half macros cleanup (#10147)
Summary:
This PR removes couple of macros throughout TH* as part of the re-factoring effort for ATen. Removing these macros should avoid confusion among developers who are trying to move things from TH* to ATen. This PR is part of the THCNumerics deprecation that I have been working on following up on mruberry's https://github.com/pytorch/pytorch/pull/9318. I am separating these two commits to see if removal of these macros doesn't upset the pytorch public CI, as well as internal builds.

- Commit 1248de7baf removes the code paths guarded by `CUDA_HALF_INSTRUCTIONS` macro. Since the macro was removed in commit 2f186df52d, `ifdef CUDA_HALF_INSTRUCTIONS` would return false and hence the code path that is kept after this change is for the false case of `ifdef CUDA_HALF_INSTRUCTIONS`

- Commit 520c99b057 removes the code paths guarded by `CUDA_HALF_TENSOR` macro. Since Pytorch now provides support for only CUDA 8.0 and above, `CUDA_HALF_TENSOR` is always true since CUDA 8.0 satisfies `CUDA_HAS_FP16` and hence, the code path that is kept after this change is for the true case of `ifdef CUDA_HALF_TENSOR`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10147

Differential Revision: D9345940

Pulled By: soumith

fbshipit-source-id: c9392261dd432d304f1cdaf961760cbd164a59d0
2018-08-15 13:25:42 -07:00
86363e1d8e Move RNN implementations to C++ (#10481)
Summary:
This is the first of two changes that are supposed to improve how we handle RNNs in the JIT. They still get traced as `PythonOp`s, but now it will be much easier to actually expose them to the JIT as e.g. `aten::lstm`, and ignore the Python interpreter entirely. This needs some symbolic adjustments that will be part of a second PR.

Even when we fix symbolics, there will still be a bit of a problem with statefulness of the cuDNN API (we need a mutable cache for the dropout state, but our IR has no way of representing that).

zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10481

Reviewed By: ezyang

Differential Revision: D9341113

Pulled By: apaszke

fbshipit-source-id: 0ae30ead72a1b12044b7c12369d11e5ca8ec30b5
2018-08-15 13:25:41 -07:00
484395edfb Fix corner case with torch.multinomial (#9960)
Summary:
In the shortcut for n_sample=1, when category 0 has 0 weight,
we should not map the (uniform) sample 0 to category 0.
The conversion uniform->multinomial was apparently written to work on
a (0,1] range (like curand uses), but PyTorch uses a [0,1) range.

Fixes: #4858. Thank you, Roy Fejgin for reporting.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9960

Reviewed By: soumith

Differential Revision: D9341793

Pulled By: ailzhang

fbshipit-source-id: 6b1a96419a7bc58cc594f761f34c6408ff6354cf
2018-08-15 13:25:39 -07:00
fb09292020 Increase tolerance in ConvBN test
Summary: reduce flakiness of test

Reviewed By: Maratyszcza

Differential Revision: D9344877

fbshipit-source-id: 24d5e1b873f94d816c980f3b7db93248cf10aca5
2018-08-15 13:14:35 -07:00
254dedf604 Propagate NaN through threshold (#10277)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/10238
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10277

Reviewed By: SsnL

Differential Revision: D9199825

Pulled By: soumith

fbshipit-source-id: 8ee7f9a72d9546d429f311c3f6028461d3c93fe2
2018-08-15 12:59:31 -07:00
0bbcc7b534 Don't assume curl version in Windows build script (#10476)
Summary:
Since we can't specify version number to `choco install curl`, we should not assume that `7.57.0` is the curl version that's in the Windows AMI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10476

Differential Revision: D9303129

Pulled By: yf225

fbshipit-source-id: 198544be68330860fbcf93c99bc995f4e280bda7
2018-08-15 12:59:23 -07:00
85408e744f Move filler interface to operator schema (#10522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10522

Move filler interface to operator schema to avoid extra code for
caffe2 mobile.

Reviewed By: dzhulgakov

Differential Revision: D9312940

fbshipit-source-id: 77fb2406f0c6b171a1912a207e05e36da50c6966
2018-08-15 12:40:18 -07:00
9646d68962 support broadcasting in _kl_categorical_categorical (#10533)
Summary:
Support broadcasting in _kl_categorical_categorical

this makes it possible to do:
```
import torch.distributions as dist
import torch
p_dist = dist.Categorical(torch.ones(1,10))
q_dist = dist.Categorical(torch.ones(100,10))
dist.kl_divergence(p_dist, q_dist)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10533

Differential Revision: D9341252

Pulled By: soumith

fbshipit-source-id: 34575b30160b43b6c9e4c3070dd7ef07c00ff5d7
2018-08-15 12:40:17 -07:00
05a260da43 Bump gloo to latest master (#10545)
Summary:
Needed by the Gloo development team. Verifying nothing breaks in CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10545

Reviewed By: Maratyszcza

Differential Revision: D9344413

Pulled By: orionr

fbshipit-source-id: 207edb71170870bacec47a635a12d7f55b6c1275
2018-08-15 12:25:44 -07:00
5d27d68779 remove implicit conversion to cpu (#10416)
Summary:
Fixes #9934
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10416

Differential Revision: D9276252

Pulled By: ailzhang

fbshipit-source-id: ea7d9d4f9390edefcd0865a98498f6c4307c291d
2018-08-15 12:25:42 -07:00
9cffe783f1 relax tolerance for two torch.half (float16) tests (#10519)
Summary:
Two tests in the 'nn' test bucket may fail when the torch.half
(float16) data type is used. The assertions used in the tests
intend to allow slight floating point imprecision in the results,
but the tolerances used for the comparisons are too strict for
the half type.

Relax the tolerances so that slight float16 imprecision won't
cause test failures.

The affected tests are:

- test_variable_sequence_cuda
- test_Conv2d_groups_nobias

For more information, see issue:

https://github.com/pytorch/pytorch/issues/7420
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10519

Differential Revision: D9343751

Pulled By: soumith

fbshipit-source-id: 90aedf48f6e22dd4fed9c7bde7cd7c7b6885845a
2018-08-15 12:11:20 -07:00
d93e8ab343 Nomnigraph - Refactor SubtreeMatchCriteria to become a Graph of MatchNode (#10512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10512

SubtreeMatchCriteria now becomes a graph of MatchNode

MatchNode consists of NodeMatchCriteria, nonTerminal and count. This is a cleaner internal representation of the data structure and will bring us much closer to DAG matching.

Note that I still keep the debugString method because convertToDotGraph doesn't currently work with Subgraph.

Reviewed By: bwasti

Differential Revision: D9321695

fbshipit-source-id: 58a76f007a9a95d18cf807d419c2b595e9bc847f
2018-08-15 12:11:18 -07:00
f59bcea2c3 parallel max and min for ATen on CPU (#10343)
Summary:
optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10343

Differential Revision: D9330799

Pulled By: ezyang

fbshipit-source-id: 5b8271e0ca3e3e73f88a9075aa541c8756001b7c
2018-08-15 11:41:01 -07:00
44b029f5b8 move matrix formation for dot products to precompute/request-only (#10531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10531

fixed a naming issue in pairwise_similarity

Reviewed By: huayuli00

Differential Revision: D9331716

fbshipit-source-id: d7de36f20504c08b1c7871ccdffa343221a3da0c
2018-08-15 11:02:10 -07:00
f5a4dd89b5 Implements volumetric (5d) affine grid generation. (#8322)
Summary:
I've implemented affine grid generation for volumetric (5d) inputs. The implementation is based off of the spatial implementation, extended by one dimension. I have a few questions about my implementation vs. the existing one that I will add inline.

I have some extensive test cases for the forward pass here: https://gist.github.com/elistevens/6e3bfb20d8d0652b83bd16b3e911285b However, they use `pytest.fixture` extensively, so I'm not sure the best way to incorporate them into the pytorch test suite. Suggestions? I have not tested backwards at all.

Diff probably best viewed with whitespace changes ignored.

Thanks for considering!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8322

Differential Revision: D9332335

Pulled By: SsnL

fbshipit-source-id: 1b3a91d078ef41a6d0a800514e49298fd817e4df
2018-08-15 11:02:08 -07:00
d8ff7ad6f8 generalize order switch ops for 1-3d (#10395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395

Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images.
This diff generalizes them to 1D and 3D, and also add a unit test we didn't have.

Reviewed By: protonu

Differential Revision: D9261177

fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda
2018-08-15 10:09:31 -07:00
0f05f5fb07 ATen layer norm symbolic (#10513)
Summary:
We can't rely on the ATen fallback pathway here because we need to parse out the constant attributes explicitly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10513

Reviewed By: dzhulgakov

Differential Revision: D9322133

Pulled By: jamesr66a

fbshipit-source-id: 52af947e6c44532ef220cb4b94838ca838b5df06
2018-08-15 08:28:52 -07:00
ce8e8feceb Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. (#10390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390

Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified.
* The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold.
* In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'.

Reviewed By: wat3rBro

Differential Revision: D9252726

fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499
2018-08-14 23:54:23 -07:00
e41528a5cc Also set stdin to subprocess pipe in FindCUDA windows popen call (#10379)
Summary:
Background: we run pytorch in embedded C++ pipelines, running in C++ GUIs in https://github.com/Kitware/VIAME and without this addition, the call was failing with the below error, but only on certain windows platforms/configurations:

OSError: [WinError6] The handle is invalid
At:
C:\Program Files\VIAME\Python36\site-packages\torch\cuda_init_.py(162):_lazy_init
C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): <lambda>
C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(182): _apply
C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(176): _apply
C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): cuda
C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\arrows\pytorch\pytorch_resnet_f_extractor.py(74):_init_
C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\processes\resnet_descriptors.py(132): _configure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10379

Differential Revision: D9330772

Pulled By: ezyang

fbshipit-source-id: 657ae7590879004558158d3c4abef2ec11d9ed57
2018-08-14 23:10:20 -07:00
f1631c3106 Modify build.sh and test.sh scripts for ppc64le jenkins build and test (#10257)
Summary:
Initial jenkins builds / test scripts for ppc64le.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10257

Differential Revision: D9331278

Pulled By: ezyang

fbshipit-source-id: 6d9a4f300a0233faf3051f8151beb31786dcd838
2018-08-14 21:54:44 -07:00
19ad55cc02 set coalesced=false at sparse transpose() and removed transpose invariants (#10496)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/6219
- removed invariants at https://github.com/pytorch/pytorch/pull/4707
- assume a sparse tensor with coalesced=true when:
1. its elements are unique and
2. the indices are in sorted order
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10496

Differential Revision: D9311214

Pulled By: weiyangfb

fbshipit-source-id: 167fa5a8e9e5f9c800db02f728a1194029f7e4f3
2018-08-14 21:25:37 -07:00
964e30de1d Workaround for Cuda9.2 and GCC7 compilation errors (#10510)
Summary:
Breaking out of #8338

This PR is a workaround for a bug with CUDA9.2 + GCC7.

Here is the error this PR fixed:
.../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace*)’:
.../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’
   BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace* ws)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510

Reviewed By: orionr

Differential Revision: D9319742

Pulled By: mingzhe09088

fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81
2018-08-14 20:54:52 -07:00
b6cc65afea Send, Recv, RecvAnysource, Barrier Op for MPI PG and Python Bindings (#10227)
Summary:
Based on: https://github.com/pytorch/pytorch/pull/10199
Added:
(1) send, recv, recvanysource, and barrier for MPI process group.
(2) python binding
(3) testing

Please review: 2e64f5d675
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10227

Reviewed By: ailzhang

Differential Revision: D9327138

Pulled By: teng-li

fbshipit-source-id: 80496714550a3ca498eb474465ddbd1b8d657d49
2018-08-14 20:10:11 -07:00
26e40fa665 Tensor.accessor now fails on rvalue reference (#10518)
Summary:
Previously, it's easy to do `x[0].accessor<float, 2>()`. However, x[0] is a temporary, so the accessor will point to invalid strides/sizes and probably segfault. With this change, such unsafe code is a compile error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10518

Reviewed By: goldsborough

Differential Revision: D9329288

Pulled By: ebetica

fbshipit-source-id: d08763bee9a19a898b9d1ea5ba648f27baa1992f
2018-08-14 19:41:31 -07:00
17ecc06b65 static casting TIndex (#10514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10514

fix the bug which break the windows build in fused_rowwise_random_quantization_ops.h

Reviewed By: ezyang, jspark1105

Differential Revision: D9322291

fbshipit-source-id: a6a27e87423b6caa973414ffd7ccb12076f2e1e4
2018-08-14 18:42:44 -07:00
60aa416a6d Re-purpose setup_caffe2.py for faster caffe2 build iterations (#10520)
Summary:
setup.py is the official install script, setup_caffe2.py is not used any more
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10520

Reviewed By: yinghai

Differential Revision: D9325548

Pulled By: bddppq

fbshipit-source-id: 3dda87f3dff061b574fd1d5c91859044f065ee33
2018-08-14 18:13:19 -07:00
32bb4040dd Unified type annotation parsing for script frontends (#10279)
Summary:
After this, all combinations of {String frontend, Python AST Frontend}{Python 3-style type annotations, MyPy-style type comments}{Script method, Script function} should properly accept type annotations.

Possible TODOs:
- Clean up the functions marked HACK
- Clean up the Subscript tree-view to better match the Python AST versions
- Can we use this for Python functions? That's the only place annotations.get_signature() is still needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10279

Differential Revision: D9319726

Pulled By: jamesr66a

fbshipit-source-id: b13f7d4f066b0283d4fc1421a1abb9305c3b28fa
2018-08-14 18:13:15 -07:00
b69b1c477b Adding python binding for MPI process group (#10199)
Summary:
Based on https://github.com/pytorch/pytorch/pull/10159

Please review ProcessGroupMPI.cpp/hpp and init.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10199

Reviewed By: yf225

Differential Revision: D9324027

Pulled By: teng-li

fbshipit-source-id: 2dd524bee0c7ca8f9594ec3b4f3ebbbb608df337
2018-08-14 15:56:33 -07:00
39bfc2d0d4 Nomnigraph - add diagnostic ability for Subgraph matching API (#10267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10267

isSubtreeMatch now returns a SubtreeMatchResult which contains a match flag and a debugMessage string that contains the reason why a subtree is not matched (if requested).

Reviewed By: bwasti

Differential Revision: D9182429

fbshipit-source-id: 530591fad592d02fb4c31fc398960a14ec90c86a
2018-08-14 15:56:31 -07:00
3c39e857ca Python binding for reduce,allgather,scatter,gather ops and python tests (#10159)
Summary:
Provided python binding for these four ops. Also provided nccl binding test.

Based on https://github.com/pytorch/pytorch/pull/10058

Please only review init.cpp, and test file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10159

Reviewed By: yf225

Differential Revision: D9323192

Pulled By: teng-li

fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e
2018-08-14 14:24:57 -07:00
16ecd6f99c Fix Debug Build On Windows (#10359)
Summary:
compile files in torch/csrc with /MDd runtime library option for debug build on Windows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10359

Differential Revision: D9316946

Pulled By: SsnL

fbshipit-source-id: c84bfad81d61cd49f39b7bce7177edd2b1e8bd69
2018-08-14 13:24:14 -07:00
3f3a30f79c Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups (#10058)
Summary:
Added
- Reduce (both NCCL and MPI)
- AllGather (both NCCL and MPI)
- Gather (MPI)
- Scatter (MPI)

for c10d process groups. This basically finalizes all supported ops for C10d to match THD.

All ops are tested as well.

```
mpirun -np 8 ./ProcessGroupMPITest
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
```

```
./ProcessGroupNCCLTest
Allreduce test successful
Broadcast test successful
Reduce test successful
Allgather test successful
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10058

Reviewed By: yf225

Differential Revision: D9316312

Pulled By: teng-li

fbshipit-source-id: 6a6253268d34332327406b1f87335d1402f7133f
2018-08-14 13:10:21 -07:00
13814d6744 Remove use of data() in optimizers (#10490)
Summary:
After talking to users of the C++ API we found that having the tensor type be `autograd::Variable` causes more complications than having it be `at::Tensor`. It used to be a problem because `at::Tensor` didn't have the "autograd API" of variable (e.g. `detach()` or `grad()` methods), but those methods are now on `at::Tensor`. As such, we want to make a last big breaking change to have the tensor type be `at::Tensor`, while factory methods like `torch::ones` will return `Variable`s disguised as `at::Tensor`. This will make many things easier, like calling functions in ATen that take vectors of tensors.

This PR makes a small step in this direction by updating the optimizer classes to not use `.data()` on `Variable` to access the underlying `at::Tensor`. Using `.data()` is effectively a hack to work around our modification rules for tensors that require grad. The proper way of doing things is to use `with torch.no_grad` or equivalently `NoGradGuard` in C++ to guard in-place operations.

The next step can then simply redefine `torch::Tensor` to be `at::Tensor`. This transition should be smooth, since all methods available on `Variable` are at this point available on `at::Tensor`.

For this PR I:

1. Modified the implementations of optimizers to not use `.data()`. This means the implementations are now different from PyTorch, which still uses the legacy method of using `.data`.
2. To properly verify (1), I added more fine-grained test cases to our optimizer tests, e.g. `SGD` with and without `weight_decay`, then with `nesterov` etc. Generally more tests = more happy!
3. Minor cleanup of the optimizer codebase

ebetica apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10490

Differential Revision: D9318229

Pulled By: goldsborough

fbshipit-source-id: fb386700f37840542bc5d323f308ea88fe5ea5c5
2018-08-14 13:10:19 -07:00
bdb11e716a Split the dependence of ONNX from test_operators.py (#10151)
Summary:
Now, run `python test/onnx/test_operators.py --no-onnx`, we won't introduce any onnx python dependence. (No onnx/protobuf python packages needs to be installed)

The major changes:
- output pbtxt from C++ exporter directly, so the floating format may be slightly different. (This should be fine, since it's just to guard ONNX exporting.)
- ONNX python packages are only imported if we run the ONNX related checks. Those checks are disabled when using `--no-onnx` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10151

Reviewed By: jamesr66a

Differential Revision: D9130706

Pulled By: houseroad

fbshipit-source-id: ea28cf5db8399929179698ee535137f209e9ce6f
2018-08-14 12:54:44 -07:00
eea8ab1861 Move common code to RNNCellBase. (#10399)
Summary:
There are three classes `RNNCell`, `LSTMCell`, `GRUCell` inherited from `RNNCellBase`, all defining the identical initialization function `reset_parameters`. Lets move it to the common base.
Another option is to have different initialization for RNN, LSTM and GRU. Maybe those weights whose output is processed with sigmoid (i.e. gain=1) should be initialized differently from those going to tanh (gain=5/3)?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10399

Differential Revision: D9316978

Pulled By: SsnL

fbshipit-source-id: a2d9408f0b5c971a3e6c3d42e4673725cf03ecc1
2018-08-14 12:39:59 -07:00
bd497809e2 CAFFE_ENFORCE -> CAFFE_ENFORCE_EQ for error with more information (#10244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10244

Use CAFFE_ENFORCE_EQ(x, y) instead of CAFFE_ENFORCE(x == y) in conv_op_impl.h for error messages with more information.

Reviewed By: viswanathgs

Differential Revision: D9177091

fbshipit-source-id: cf8d10afec1ce6793d3ae0b62f05648722a4130b
2018-08-14 12:24:44 -07:00
2400512a08 Remove unnecessary include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10486

Reviewed By: ml7

Differential Revision: D9305283

fbshipit-source-id: 0d1316f9a72670ddbe8d95ead93603d00ad0f63b
2018-08-14 12:10:04 -07:00
d1442b36f3 add a rebuild_libtorch command for speedier iteration. (#10036)
Summary:
It just calls into `ninja install`. For iterative work on
libtorch.so/_C.so,
`python setup.py rebuild_libtorch develop` should provide quick iteration
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10036

Differential Revision: D9317869

Pulled By: anderspapitto

fbshipit-source-id: 45ea45a1b445821add2fb9d823a724fc319ebdd2
2018-08-14 12:10:02 -07:00
520f4f6cb9 Added some unit test for box_with_nms_limit_op. (#10389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389

Added some unit test for box_with_nms_limit_op.

Reviewed By: wat3rBro

Differential Revision: D9237860

fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731
2018-08-14 11:55:03 -07:00
d043f83019 Add tests for Tensor.* nn.* F.* docs (#10311)
Summary:
Test only for existence for now. I had to skip a lot of them so there a FIXME in the test.

Also I'm not testing torch.* because of namespace issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311

Differential Revision: D9196341

Pulled By: SsnL

fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc
2018-08-14 11:39:46 -07:00
b4462511fd Add LSTMCell backward pass expect tests (#10506)
Summary:
- Exposed get_debug_graph for ScriptModule (gets the debug graph for its
  forward Method)
- Added forward/backward expect tests for lstm and milstm cells. These
  are intended to prevent regressions

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10506

Differential Revision: D9316590

Pulled By: zou3519

fbshipit-source-id: 3c2510d8363e9733ccbc5c7cc015cd1d028efecf
2018-08-14 11:39:44 -07:00
e5811becdd Add tags for onnx tensor descriptors (#10502)
Summary:
We missed 2 places to add tags when we create tensor descriptors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10502

Reviewed By: Maratyszcza

Differential Revision: D9312075

Pulled By: yinghai

fbshipit-source-id: 329e83ec5470b0a778d2eda525dd6f2143facbdf
2018-08-14 11:25:52 -07:00
9497383706 Fix some warnings (#10297)
Summary:
Fixing some compiler warnings while looking at symbol visibility.

cc smessmer ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10297

Reviewed By: soumith

Differential Revision: D9195336

Pulled By: orionr

fbshipit-source-id: 04cbfd3549984caec7bdd1a5b39a6d25e80348e9
2018-08-14 10:40:08 -07:00
61bedc96f0 Schema-based creation of graph nodes (#10198)
Summary:
This commit adds the ability to insert a node with inputs, using the schema to check the inputs are valid types, fill in any default values, and perform standard implicit conversions. Since it is schema based, it will discover and use the right overload.
Constructors to `NamedValue` enable it to be constructed using `IValue` constants so it is possible to use constant values in the input list as well:

```
g.insert(aten::add, {v, 3});
```

Keyword arguments are also supported:

```
g.insert(aten::add, {v}, {{"other", t}, {"scalar", 1}});
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10198

Differential Revision: D9307252

Pulled By: zdevito

fbshipit-source-id: 644620aa85047d1eae1288383a619d50fec44d9b
2018-08-14 10:25:38 -07:00
3a40baa15c fix a grammatical error: accelerate compute (#10204)
Summary:
"accelerate compute"
a verb shouldn't go with another verb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10204

Differential Revision: D9316699

Pulled By: fmassa

fbshipit-source-id: f1126c594905c3236ffd6b7e57a92552d3d4c1f1
2018-08-14 10:11:15 -07:00
ef44faece2 check attribute existence in torch.legay.nn.SpatialFullConvolution in method type (#8740)
Summary:
This is related to #5255
When adding cuda support for the model, this error comes:
``
AttributeError: 'SpatialFullConvolution' object has no attribute 'finput'
``
here is my short code for test.
https://gist.github.com/kaleaht/26518c3deea5d1d3dda722fbf1f3ecdc

I converted torch7's model also from here.
https://github.com/art-programmer/FloorplanTransformation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8740

Differential Revision: D8872735

Pulled By: SsnL

fbshipit-source-id: 8d97f8b59cdf4049e87be14b78c4608fd973d149
2018-08-14 10:11:13 -07:00
329d901a91 Fold AffineChannel to Conv, the same way as BN (for Detectron models) (#10293)
Summary:
AffineChannel is being used by public Detectron models, e.g. Mask-RCNN and Faster-RCNN. This PR folds this op into convolution the same way as BN to speed up inference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10293

Differential Revision: D9276789

Pulled By: yinghai

fbshipit-source-id: fbf6dd2c1be05f5713f760752e7245b1320a122b
2018-08-13 22:43:37 -07:00
c618df154e Add intrinsic support for external_input/output to nomnigraph (#10100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10100

nomnigraph has until this point tried to ignore external input and output, as they aren't very well defined (does order matter?).  but for DCE and some of Keren's work they are becoming necessary.  I went ahead and added this to the core nomnigraph converter

Reviewed By: yinghai

Differential Revision: D9105487

fbshipit-source-id: a2e10e3cc84515611d6ab7d4bc54cf99b77729c0
2018-08-13 21:39:17 -07:00
7d16e87f14 Fix byte ordering issue in from_numpy (#9508)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/3671 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9508

Differential Revision: D9307186

Pulled By: soumith

fbshipit-source-id: 39dcaa6fd2d330d7085802acd6f63c19270164fa
2018-08-13 21:39:16 -07:00
facb293aad Fix FindMKL.cmake for Windows (#10453)
Summary:
Targets the issue discussed at https://github.com/pytorch/pytorch/pull/7399#issuecomment-400788971.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10453

Differential Revision: D9311591

Pulled By: soumith

fbshipit-source-id: ac0712e10bdac4ea3f76d6fbad2178ec958b3a31
2018-08-13 21:09:27 -07:00
fed05cf4cf Fix prim::FusedConcat bug (#10466)
Summary:
Fixes #10456

The graph fuser was fusing together groups with prim::FusedConcat (the producer) with other ops (the consumer) if the consumer is fusable. For example,

```
import torch
torch.jit.script
def fn(x, y, z):
    x1 = x + y
    y1 = x - y
    w = torch.cat([x1, y1])
    return w + z

x = torch.randn(2, 2, dtype=torch.float, device='cpu')
y = torch.randn(2, 2, dtype=torch.float, device='cpu')
z = torch.randn(4, 2, dtype=torch.float, device='cpu')
fn(x, y, z)
fn.graph_for(x, y, z)
```
produced the following graph:
```
graph(%x : Float(2, 2)
      %y : Float(2, 2)
      %z : Float(4, 2)) {
  %3 : int = prim::Constant[value=1]()
  %y1 : Float(2, 2) = aten::sub(%x, %y, %3)
  %8 : int = prim::Constant[value=0]()
  %14 : Float(4, 2) = prim::FusionGroup_0[device=-1](%z, %y1, %x, %y)
  return (%14);
}
with prim::FusionGroup_0 = graph(%1 : Float(4, 2)
      %5 : Float(2, 2)
      %7 : Float(2, 2)
      %8 : Float(2, 2)) {
  %11 : int = prim::Constant[value=1]()
  %9 : int = prim::Constant[value=1]()
  %x1 : Float(2, 2) = aten::add(%7, %8, %9)
  %w : Float(4, 2) = prim::FusedConcat[dim=0](%x1, %5)
  %2 : int = prim::Constant[value=1]()
  %3 : Float(4, 2) = aten::add(%w, %1, %2)
  return (%3);
}
```

this is a problem because it violates two invariants:
1) all inputs to the FusionGroup must have the same size
2) prim::FusedConcat's output must not be used inside the FusionGroup

This PR fixes this problem by checking if the output to a FusionGroup came from a prim::FusedConcat node when deciding whether to fuse the consumer and producer.
If the producer is a value that came from a prim::FusedConcat node in a FusionGroup, then consumer & producer do not get fused.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10466

Differential Revision: D9296686

Pulled By: zou3519

fbshipit-source-id: ed826fa9c436b42c04ca7d4d790cece804c162bd
2018-08-13 21:09:25 -07:00
099a545376 Hipify Caffe2 binaries (#10468)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10468

Reviewed By: yinghai

Differential Revision: D9301178

Pulled By: bddppq

fbshipit-source-id: 5da88aa4d79a5142f8e744cdcd8ae85951bc387c
2018-08-13 20:56:28 -07:00
9a9224e5c1 Remove "locally" from CONTRIBUTING.md (#10495)
Summary:
A bootcamper was confused by the word "locally" and thought it meant on his macbook as opposed to his FB dev machine. Besides the confusion for the FB context, the word "locally" isn't really necessary at all

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10495

Reviewed By: soumith

Differential Revision: D9311480

Pulled By: goldsborough

fbshipit-source-id: 2779c7c60f903a1822a50d140ed32a346feec39e
2018-08-13 20:56:26 -07:00
f6eb966fd2 Fix TanhGradientOperator linker errors (#10426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10426

We were seeing linker errors for TanhGradientOperator in multifeed. Since we only use the float specialization, we might as well define it that way.

Reviewed By: yinghai

Differential Revision: D9280622

fbshipit-source-id: d2ffb698c73a84bb062de5e1f3bda741330e4228
2018-08-13 17:57:10 -07:00
ffb59e5f20 adding stochastic quantization caffe2 operators (encoder and decoder in CPU are implemented. GPU mode is pending)
Summary:
This operator implements b (1/2/4/8) bit stochastic quantization of a floating
matrix in a row-wise fashion. 8/b floating values are concatenated to a byte
and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629

Reviewed By: harouwu

Differential Revision: D8493264

fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02
2018-08-13 16:39:23 -07:00
c6fc3ab557 fixes printing non-contiguous tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10405

Differential Revision: D9302794

Pulled By: soumith

fbshipit-source-id: e4a7db8d33400a5a050d05fd1679de8bc3cbcf30
2018-08-13 16:26:20 -07:00
216961b7bf Remove is_zero_dim_ bool in THTensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10415

Reviewed By: ezyang

Differential Revision: D9274954

Pulled By: gchanan

fbshipit-source-id: 353a52d91556d5b81c3510eb2bf399d102c9a0a4
2018-08-13 12:39:06 -07:00
f59cce95b4 Some symbol annotation fixes for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10369

Differential Revision: D9300187

Pulled By: ezyang

fbshipit-source-id: bf29966ad6aa221332b7232a965fb85e652f866d
2018-08-13 12:26:00 -07:00
382ff03222 Add missing #pragma once
Reviewed By: ml7

Differential Revision: D9299779

fbshipit-source-id: b5b5a1b9ead1b275d3ae54ecfad99617d2869094
2018-08-13 11:39:45 -07:00
75651d5b58 improve use of ROCm libraries, enable more tests, small fixes (#10406)
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
2018-08-13 11:39:43 -07:00
cd81217f8e A single print statement in setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10473

Reviewed By: ml7

Differential Revision: D9299196

Pulled By: pjh5

fbshipit-source-id: f9aa84c2859df12f9da9ac5205e1918c253e19fb
2018-08-13 11:39:42 -07:00
0b63d12db6 Don't call into Python during Storage destruction. (#10407)
Summary:
```
This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some
programs that use multiprocessing. The backtrace pointed to
StorageRef.__del__ being called from subtype_dealloc. My guess is that
the Python interpreter was shutdown before all C++ Storage objects were
deallocated. Deallocating the C++ Storage called the finalizer which
called back into Python after it was no longer safe to do so.

This avoids a callback from C++ into Python during Storage finalization.
Instead, dead Storage objects (expired weak references) are collected
periodically when shared_cache exceeds a limit. The limit is scaled with
2x the number of live references, which places an upper bound on the
amount of extra memory held by dead Storage objects. In practice, this
should be very small.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407

Differential Revision: D9272400

Pulled By: colesbury

fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2
2018-08-13 11:20:07 -07:00
64235d5c01 Rewrite TensorImpl to use TensorTypeId. (#10278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10278

Translation to Backend happens immediately before we go into the
Type universe; otherwise we use TensorTypeId.

I allocated TensorTypeId corresponding exactly to existing ATen
Backend.  Only CPUTensorId and CUDATensorId are relevant in the
Caffe2 universe.

Reviewed By: gchanan

Differential Revision: D9184060

fbshipit-source-id: 9d3989c26f70b90f1bbf98b2a96c57e2b0a46597
2018-08-13 11:20:04 -07:00
145eb330ad Back out "Back out "Move typeid.h to move to ATen/core"" (#10465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10465

Original commit changeset: 7050fe845e65

Reviewed By: jerryzh168

Differential Revision: D9296375

fbshipit-source-id: cb8161440ba809dcec5027858a29cd026d537fc3
2018-08-13 11:20:01 -07:00
b8530dc1f0 A few additions (#9837)
Summary:
This PR provides 4 fixes / features:

1. torch::nn::Cloneable inherits virtually from torch::nn::Module. We want to pass around a module with new functions, and the best way to do this is to do a diamond inheritance pattern, i.e.

```c++
struct MySuperModuleImpl : virtual public torch::nn::Module {
  virtual void myFunction() = 0;
}

struct MySuperModule : public torch::nn::Cloneable<MySuperModule>, MySuperModuleImple {};

struct MyModule : public MySuperModule<MyModule> {
  void myFunction() override;
};
```

This way, we can simply pass around MySuperModuleImpl around instead of torch::nn::Module.

2. Optimizer options are public now, since there's no way to decay the LR or modify it during training otherwise
3. Serialization functions creates autograd history and calls copy_! Bad!
4. Optimizers did not create buffers after add_parameters was called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9837

Reviewed By: goldsborough

Differential Revision: D9199746

Pulled By: ebetica

fbshipit-source-id: 76d6b22e589a42637b7cc0b5bcd3c6b6662fb299
2018-08-13 10:24:58 -07:00
0a39a9cfbc Add db directory for hipifying (#10428)
Summary:
bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10428

Differential Revision: D9297115

Pulled By: bddppq

fbshipit-source-id: d7134ff24102f03f762e6a7b4340055546c9ecfd
2018-08-13 10:24:56 -07:00
56267cc97b gflags improvement to allow CAFFE2_EXPORTS (#10444)
Summary:
Explanation copied from code:

// Motivation about the gflags wrapper:
// (1) We would need to make sure that the gflags version and the non-gflags
// version of Caffe2 are going to expose the same flags abstraction. One should
// explicitly use caffe2::FLAGS_flag_name to access the flags.
// (2) For flag names, it is recommended to start with caffe2_ to distinguish it
// from regular gflags flags. For example, do
//    CAFFE2_DEFINE_BOOL(caffe2_my_flag, true, "An example");
// to allow one to use caffe2::FLAGS_caffe2_my_flag.
// (3) Gflags has a design issue that does not properly expose the global flags,
// if one builds the library with -fvisibility=hidden. The current gflags (as of
// Aug 2018) only deals with the Windows case using dllexport, and not the Linux
// counterparts. As a result, we will explciitly use CAFFE2_EXPORT to export the
// flags defined in Caffe2. This is done via a global reference, so the flag
// itself is not duplicated - under the hood it is the same global gflags flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10444

Differential Revision: D9296726

Pulled By: Yangqing

fbshipit-source-id: a867d67260255cc46bf0a928122ff71a575d3966
2018-08-13 09:54:48 -07:00
64a6f17177 Fix ATen/core header installation. (#10463)
Summary:
Fixes #10353 and fixes #10397.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10463

Differential Revision: D9296491

Pulled By: ezyang

fbshipit-source-id: f825c2a21a113e44a6f5c1c5ec17814d9deac366
2018-08-13 09:25:49 -07:00
fa5d95a00c Bump onnx to onnx/onnx@0d250de (#10452)
Summary:
0d250dea76
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10452

Reviewed By: houseroad

Differential Revision: D9288037

Pulled By: bddppq

fbshipit-source-id: 206be3ee2b8ebca26f3d8af0597078363ed6d168
2018-08-13 00:09:15 -07:00
3cbe8f0c3e Detect system RocksDB installation with CMake config files. (#7315)
Summary:
On Windows, the FindRocksDB script doesn't detect rocksdb installation built by cmake.
And it doesn't include/link the RocksDB dependencies either, like:
  * `Snappy`
  * `Shlwapi.lib`
  * `Rpcrt4.lib`

This PR try to detect in config mode first before using private find module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7315

Differential Revision: D9287587

Pulled By: Yangqing

fbshipit-source-id: 314a36a14bfe04aa45013349c5537163fb4c5c00
2018-08-12 18:24:10 -07:00
82d11b847e Use CUDA_LINK_LIBRARIES_KEYWORD instead of hacking. (#10437)
Summary:
There's no need to hack.
Using `CUDA_LINK_LIBRARIES_KEYWORD` is the normal way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10437

Differential Revision: D9287579

Pulled By: Yangqing

fbshipit-source-id: d3d575ea8c3235576ba971e4b7493ddb435f92f3
2018-08-12 18:09:20 -07:00
508de8109f Added missing "AT_" prefix to macro. (#10436)
Summary:
For issue #10435
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10436

Differential Revision: D9287578

Pulled By: Yangqing

fbshipit-source-id: b07de3a2d7fa6f980a189b5e8f7ce05dfa1bef50
2018-08-12 18:09:19 -07:00
1756daaa75 Use FULL_CAFFE2 to build caffe2 and python in one shot (#10427)
Summary:
Building caffe2 and pytorch separately will end up duplicated symbols as they now share some basic libs. And it's especially bad for registry. This PR fixes our CI and build them in one shot with shared symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10427

Reviewed By: bddppq

Differential Revision: D9282372

Pulled By: yinghai

fbshipit-source-id: 0514931ea88277029a68fa5368ff4336472f132e
2018-08-12 15:39:12 -07:00
51f154e072 Fix Python lint errors. (#10441)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10441

Reviewed By: Yangqing

Differential Revision: D9285502

Pulled By: ezyang

fbshipit-source-id: 12c94b28bee9cade930c8f260577e81ea1915269
2018-08-11 21:08:50 -07:00
cd53b78bd0 Remove caffe namespace GetEmptyStringAlreadyInited (#10438)
Summary:
A followup cleanup of #10380 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10438

Differential Revision: D9285692

Pulled By: Yangqing

fbshipit-source-id: c73defbef00d3b563240d0b69d85bd0a6e3eb504
2018-08-11 17:39:58 -07:00
ab6afc2b23 Optimize max_pooling for inference for MKL-DNN/IDEEP device (#10156)
Summary:
Optimize the max_pooling operation for inference path by setting the "inference" flag to the underlying MKL-DNN, saving the computation and store of max indices which is only needed for training. To make the API compatible, training mode is still the default and inference mode is set in the optimizeForIdeep path.
Test shows the speed-up of a single max_pooling operation is up to 7X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10156

Differential Revision: D9276755

Pulled By: yinghai

fbshipit-source-id: ad533d53aabb8ccb3b592da984d6269d9b794a8a
2018-08-10 23:14:05 -07:00
d3ccc836de Fix warning in Nomnigraph (#10425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10425

`const size_t` as return value doesn't make sense.

Reviewed By: duc0

Differential Revision: D9281442

fbshipit-source-id: c3d9c94f5dbe516476f0c74f63c35e60893c8140
2018-08-10 22:40:26 -07:00
1dbdc5a93d Back out "Move typeid.h to move to ATen/core"
Summary: Original commit changeset: 21f2c89e58ca

Reviewed By: yinghai

Differential Revision: D9282171

fbshipit-source-id: 7050fe845e6524b965bdd45794a6fa1665b83e34
2018-08-10 21:39:25 -07:00
31646edfff Increase GLOO rendevous timeout
Summary: Increase GLOO rendevous timeout

Reviewed By: teng-li

Differential Revision: D9273544

fbshipit-source-id: 5c22c1d18df3032f019ff12e2a720aea7c390f15
2018-08-10 18:40:18 -07:00
767687835e Replace sudo with --user in CI caffe2 install
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10328

Reviewed By: pjh5

Differential Revision: D9275809

Pulled By: ezyang

fbshipit-source-id: c22cb1570c67199b74b2188ad83b1e4828e11911
2018-08-10 15:11:43 -07:00
adbcb3c1dc Move dropout and alpha dropout to ATen (#10384)
Summary:
zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10384

Reviewed By: ezyang

Differential Revision: D9272583

Pulled By: apaszke

fbshipit-source-id: ed5d37b28ce9ff25800bbaa0daf066cfbf1f9921
2018-08-10 14:55:28 -07:00
5b0be9de59 Remove TH compatibility calls for strides. (#10414)
Summary:
This should just work now that sizes/strides are unified between TH and ATen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10414

Differential Revision: D9274681

Pulled By: gchanan

fbshipit-source-id: 69eb766f4e3a5b6c57b15837cffdef513b6d7817
2018-08-10 13:54:58 -07:00
674f7a9778 Correctly share CUDA Parameters. (#10220)
Summary:
```
    Correctly share CUDA Parameters, requires_grad and hooks.

    Previously, the following was true:

    - If you put a Parameter for a CUDA tensor
      in multiprocessing queue (or otherwise tried to transfer it),
      this failed, saying that we cannot pickle CUDA storage.
      This is issue #9996.

    - If you put a leaf Tensor that requires_grad=True through the
      multiprocessing queue, it would come out the other end as
      requires_grad=False (It should have come out the other end
      as requires_grad=True).  Similarly, backwards hooks were
      lost.

    - If you put a non-leaf Tensor that requires_grad=True through
      the multiprocessing queue, it would come out the other end
      as requires_grad=False.

    The root cause for the first issue was that implementation of
    reductions for Parameter used the superclass implementation
    (tensor) in __reduce_ex__, but this always picks up the
    non-ForkingPickler reduction, which doesn't work with CUDA tensors.
    So, we registered a new ForkingPickler specifically for Parameter,
    and adjusted the code to correctly rewrap a Tensor in a Parameter
    if it was originally a parameter.

    While working on this, we realized that requires_grad and backwards
    hooks would not be preserved in the ForkingPickler reduction
    implementation.  We fixed the reducer to save these parameters.
    However, Adam Paszke pointed out that we shouldn't allow sending
    requires_grad=True, non-leaf Tensors over a multiprocessing
    queue, since we don't actually support autograd over process
    boundar.  We now throw an error in this case; this may cause
    previously working code to fail, but this is easy enough to fix;
    just detach() the tensor before sending it.  The error message says
    so.

    Fixes #9996.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220

Differential Revision: D9160746

Pulled By: ezyang

fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c
2018-08-10 13:54:56 -07:00
0b8a0125ab Fixes torch.log after torch.expand giving incorrect results (#10269)
Summary:
fixes #10241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10269

Differential Revision: D9272472

Pulled By: cpuhrsch

fbshipit-source-id: cd1afbb4386a0d0956ee21b24f0d529755b986ca
2018-08-10 13:39:38 -07:00
6a55238a3f Grid sampler: nearest interpolation & reflection padding (#10051)
Summary:
closes #9702 .

cc jph00

Commit structure:

1. Change the index calculation logic. I will explain using 1-D for simplicity.

	Previously we have (in pseudo code):

	```
	// 1. get the float locations from grid
	scalar_t x = from_grid()

	// 2. find the integral surrounding indices
	int x_left = floor(x)
	int x_right = x_left + 1

	// 3. calculate the linear interpolate weights
	scalar_t w_left = x_right - x
	scalar_t w_right = x - x_left

	// 4. manipulate the integral surrounding indices if needed
	// (e.g., clip for border padding_mode)
	x_left = manipulate(x_left, padding_mode)
	x_right = manipulate(x_right, padding_mode)

	// 5. interpolate
	output_val = interpolate(w_left, w_right, x_left, x_right)
	```

	This is actually incorrect (and also unintuitive) because it calculates the
	weights before manipulate out-of-boundary indices. Fortunately, this
	isn't manifested in both of the current supported modes, `'zeros'` and
	`'border'` padding:

	+ `'zeros'`: doesn't clip
	+ `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are
	  clipped to the same value, so weights don't matter

	But this is a problem with reflection padding, since after each time we reflect,
	the values of `w_left` and `w_right` should be swapped.

	So in this commit I change the algorithm to (numbers corresponding to the
        ordering in the above pseudo-code)

	```
	1. get float location
	4. clip the float location
	2. find the integral surrounding indices
	3. calculate the linear interpolate weights
	```

	In the backward, because of this change, I need to add new variables to track
	`d manipulate_output / d manipulate_input`, which is basically a multiplier
	on the gradient calculated for `grid`. From benchmarking this addition doesn't
	cause obvious slow downs.

2. Implement reflection padding. The indices will keep being reflected until
	they become within boundary.

	Added variant of `clip_coordinates` and `reflect_coordinates` to be used in
	backward. E.g.,
	```cpp
	// clip_coordinates_set_grad works similarly to clip_coordinates except that
	// it also returns the `d output / d input` via pointer argument `grad_in`.
	// This is useful in the backward pass of grid_sampler.
	scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t *grad_in)
	```
	For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`.
	If `in` is reflected **odd** times in `'reflection'` mode, `grad_in`
	is set to `-1`.

3. Implement nearest interpolation.

4. Add test cases

5. Add better input checking
  Discussed with goldsborough for moving `operator<<` of `at::Device`,
  `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise
  `AT_CHECK` can't find them.)

6. Support empty tensors. cc gchanan

    + Make empty tensors not acceptable by cudnn.
    + Add `AT_ASSERT(kernel block size  > 0)` if using `GET_BLOCKS`
   + Cache `numel` in `TensorGeometry`
      I was going to use `numel` to test if cudnn descriptor should accept a
      tensor, but it isn't used eventually. I can revert this if needed.

7. Add more test cases, including on input checking and empty tensors

8. Remove an obsolete comment

9. Update docs. Manually tested by generating docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051

Differential Revision: D9123950

Pulled By: SsnL

fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6
2018-08-10 12:43:27 -07:00
def3715e82 Minor changes for nicer pip packages (#9544)
Summary:
I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544

Reviewed By: orionr

Differential Revision: D9267111

Pulled By: pjh5

fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894
2018-08-10 12:09:46 -07:00
40109b16d0 Remove caffe1 specific proto (#10380)
Summary:
This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation.

Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380

Differential Revision: D9267981

Pulled By: Yangqing

fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390
2018-08-10 11:10:26 -07:00
018790cd4b thread BUILD_SHARED_LIBS through build_pytorch_libs.sh
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10272

Differential Revision: D9239337

Pulled By: anderspapitto

fbshipit-source-id: 187b3acb7e85635d9b45a3dd82c98d86a2b51e70
2018-08-10 10:39:31 -07:00
9b8a036873 Fix basic.cpp, which compared equality between a size [1] tensor with… (#10404)
Summary:
… a size [] tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10404

Differential Revision: D9268467

Pulled By: gchanan

fbshipit-source-id: 92bb387358f4030519c6883c12ea69312185446e
2018-08-10 10:39:29 -07:00
e524a8994b Make lengths_host_.CopyFrom synced in LengthsCosineCoherenceOp and LengthsTileOp (#10360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10360

It seems `lengths_host_.CopyFrom(lengthsInput, &context_);` is asynchronous w.r.t. the host while `lengths_host_.CopyFrom(lengthsInput);` is synchronous.

However, according to jerryzh168,  `lengths_host_.CopyFrom(lengths, &context_); context_.FinishDeviceComputation();` is the safest way to guarantee synchronization.

Reviewed By: jerryzh168

Differential Revision: D9197923

fbshipit-source-id: 827eb63d9d15c1274851e8301a793aed39d4fa6b
2018-08-10 10:39:28 -07:00
be5fb8f6fd Move fused RNN kernels into ATen (#10305)
Summary:
As in the title. I also did a small refactor that let us loose almost 400 loc. This is a first step in moving the RNN code to C++.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10305

Reviewed By: ezyang

Differential Revision: D9196227

Pulled By: apaszke

fbshipit-source-id: 54da905519aade29baa63ab1774a3ee1db5663ba
2018-08-10 09:12:05 -07:00
e221791afc Fix typo.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10387

Differential Revision: D9255840

Pulled By: gchanan

fbshipit-source-id: 97b52d4e349c1e2d1970abde7dc6b25e7cf668a0
2018-08-10 08:55:30 -07:00
1e3e26e3e8 Use nDimensionLegacyNoScalars in THTensorDimApply. (#10388)
Summary:
This issue was exposed in https://github.com/pytorch/pytorch/pull/10383.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10388

Differential Revision: D9255836

Pulled By: gchanan

fbshipit-source-id: 88c5a6415c27d56ff54d00a8957fdc1617cfbde7
2018-08-10 08:55:28 -07:00
3667d029b4 Move typeid.h to move to ATen/core (#10163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10163

- Remove dependency on caffe2/core/common.h for ATen/core/typeid.h
  Unfortunately, Windows seems to rely on typeid.h including this
  header, so it is still included from the forwarding header
  caffe2/core/typeid.h
- Deduplicate Demangle/DemangleType with their ATen equivalents

Reviewed By: smessmer

Differential Revision: D9132432

fbshipit-source-id: 21f2c89e58ca1e795f1b2caa316361b729a5231b
2018-08-10 08:45:44 -07:00
e9ad74357e Use serialization container in ir import export (#10394)
Summary:
Copy of #10191 because these changes didn't land with the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10394

Differential Revision: D9260816

Pulled By: li-roy

fbshipit-source-id: 7dc16919cfab6221fda1d44e98c5b900cfb40558
2018-08-10 00:09:30 -07:00
0950d7a98d support list slicing (#10318)
Summary:
As title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10318

Differential Revision: D9254351

Pulled By: michaelsuo

fbshipit-source-id: be891a584dc295b5e353f7f5257d64a356fb9586
2018-08-09 17:25:13 -07:00
b1e3239ec8 Fix some backwards definitions wrt keepdim. (#10382)
Summary:
Before we had 0-dim tensors in TH, we were flexible in what we accepted wrt to the difference between size [] and size [1] tensors in backwards functions because they were identical in TH.  So, we had backwards definitions that were technically incorrect, but happened to work.  This often masks shape issues, adds greatly to code complexity  and thus IMO isn't worth keeping.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10382

Differential Revision: D9244618

Pulled By: gchanan

fbshipit-source-id: 2c29c53a8ffe8710843451202cad6b4323af10e8
2018-08-09 15:11:55 -07:00
209af45614 Back out "[pytorch][PR] Fix bincount for empty input"
Summary: Original commit changeset: 6c4c66c23679

Reviewed By: SsnL

Differential Revision: D9253403

fbshipit-source-id: bf5ee669ed095c06ff58a2871f7350e879261076
2018-08-09 14:25:33 -07:00
18d2fcde7a Fix performance of DistributedSampler per #8958
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10361

Differential Revision: D9240798

Pulled By: ezyang

fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f
2018-08-09 12:54:37 -07:00
64a60030a6 Don't copy on clamp, clamp_out (#10352)
Summary:
This makes clamp and relu faster (fixes #10276).

The extra copying was introduced when clamp moved to ATen and
the _th_clamp_ wrapper was used to forward to TH/THC,
we remove that and add _th_clamp(_out) instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10352

Reviewed By: ezyang

Differential Revision: D9233590

Pulled By: SsnL

fbshipit-source-id: 4f86a045498e5e577fb22656c71f171add7ed0ac
2018-08-09 12:40:47 -07:00
b43beec070 Fix bincount for empty input (#9757)
Summary:
Added tests too. Fixes #9756 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757

Differential Revision: D8966879

Pulled By: soumith

fbshipit-source-id: 9f08a9d5d5d037db16319141d7a227a5efa23869
2018-08-09 12:40:45 -07:00
cc5b47ff47 Fix the logic for PATH guess on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10372

Differential Revision: D9240207

Pulled By: soumith

fbshipit-source-id: 0933f6fde19536c7da7d45044efbdcfe8ea40e1f
2018-08-09 12:40:44 -07:00
3fa1c1022a Avoid std::thread ctor "cannot resolve" error (#10381)
Summary:
If an `at::test` function is added, gcc can't figure out the `std::thread(test, -1)` resolution.

It is not a problem for current code. I bumped into this when playing with native functions. But I think it is a good to just prevent it from happening in future by removing `using namespace at;`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10381

Differential Revision: D9241614

Pulled By: SsnL

fbshipit-source-id: 972ac3cecff3a50602b3fba463ae1ebd3f53d036
2018-08-09 11:55:40 -07:00
99b10adc01 Fix compile flags for MSVC
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10368

Differential Revision: D9240791

Pulled By: ezyang

fbshipit-source-id: 536b093b5c800cc1cf02cbbde9ae341e25d083d1
2018-08-09 09:39:58 -07:00
7d53c876dc Move maybeZeroDim to TH, change condition so it doesn't turn off scal… (#10333)
Summary:
…ars.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10333

Differential Revision: D9206091

Pulled By: gchanan

fbshipit-source-id: 492c50189edc2056aa2acce98d49234d2a54ce39
2018-08-09 09:28:57 -07:00
e967fa9757 Fix THTensor_nElement for scalars.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10332

Differential Revision: D9206039

Pulled By: gchanan

fbshipit-source-id: 0bc7c15050a6a602f621d3e9ecc3a6ea35481a6a
2018-08-09 09:28:55 -07:00
52d85bedb7 Deal with undefined tensors in unbind backward (#9995)
Summary:
When only part of the outputs of unbind are used in a backward,
the gradients for the others are undefined. This sets those
to zero in to_tensor_list.

Fixes: #9977
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9995

Differential Revision: D9239610

Pulled By: soumith

fbshipit-source-id: eb8d1b3f2b4e615449f9d856e10b946910df9147
2018-08-09 08:54:28 -07:00
b70b7066f7 Keep kEps in one place to make sure they are consistent (#10334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10334

Keep kEps in one place to make sure they are consistent

Reviewed By: xianjiec

Differential Revision: D9202280

fbshipit-source-id: 35d173ce1d1a361b5b8cdbf1eac423e906e7c801
2018-08-09 08:27:42 -07:00
04f381650e Resubmit: Fix dataloader hang when it is not completely iterated (#10366)
Summary:
https://github.com/pytorch/pytorch/pull/9655
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366

Differential Revision: D9237393

Pulled By: SsnL

fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a
2018-08-09 00:10:24 -07:00
037d8d1bab Order Loss functions alphabetically in nn.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10365

Differential Revision: D9237287

Pulled By: SsnL

fbshipit-source-id: 28e9de76b9cfd8f63c8df561ff1531ea8d0803ea
2018-08-08 22:39:55 -07:00
9dfc4edc68 Update NNPACK and cpuinfo submodules (#8564)
Summary:
Bring in extra optimizations in Winograd-based convolution on NEON
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8564

Reviewed By: hlu1

Differential Revision: D9088140

Pulled By: Maratyszcza

fbshipit-source-id: 2089191416db98bdad8f0e4848b1435fcf74a88b
2018-08-08 22:39:52 -07:00
6e49f933ad Check that result is on CPU for CPU unary ops kernels (#10358)
Summary:
Fixes: #10270
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10358

Differential Revision: D9233066

Pulled By: soumith

fbshipit-source-id: 39b7524fe55ddb899fb27e2c0ef504ce54dbad35
2018-08-08 21:11:53 -07:00
783f2c60b2 nomnigraph - Enhancements to subgraph matching APIs (#10218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10218

SubtreeMatchCriteria now supports:
- nonTerminal flag : if this is set, it means we only match the root of the subtree and do not care about the children. Example use case: to match an "input" node but does not care how the input is produced.
Additional tests for these new logic are added to subgraph_matcher_test.cc.

Subgraph matching APIs for NNGraph is also added.

(Further enhancement to make the SubgraphMatching API constructs a Subgraph object/more diagnostic information will go later).

Reviewed By: bwasti

Differential Revision: D9156092

fbshipit-source-id: 3f28ac15d9edd474b3e0cd51fd7e6f973299d061
2018-08-08 14:56:23 -07:00
69760e2840 update torch.eig() doc (#10315)
Summary:
This fixes #9383

Update torch.eig() doc, the complex part is written based on https://scc.ustc.edu.cn/zlsc/sugon/intel/mkl/mkl_manual/GUID-16EB5901-5644-4DA6-A332-A052309010C4.htm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10315

Reviewed By: yf225

Differential Revision: D9200723

Pulled By: ailzhang

fbshipit-source-id: d2e186fd24defbc4fdea6c2cf3dc4f7e05e1d170
2018-08-08 06:43:41 -07:00
0d03219a42 Remove hack as integrated builds use FULL_CAFFE2 now (#10320)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10320

Reviewed By: jerryzh168

Differential Revision: D9198902

Pulled By: ezyang

fbshipit-source-id: 8af28d607735e5f4450c40127c1f8c262ea602ce
2018-08-07 21:40:07 -07:00
7d6d7bef6a Enable docker image build for PyTorch using specific python version (#10317)
Summary:
Current Dockerfile builds pytorch using default python within miniconda, which happens to be Python 3.6

This patch allows users to specify which python should be installed in the default miniconda environment used by the pytorch dockerfile. I have tested the build for python 2.7, 3.5, 3.6 and 3.7. Python 2.7 required typing and cython
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10317

Differential Revision: D9204401

Pulled By: ezyang

fbshipit-source-id: 11355cab3bf448bbe8369a2ed1de0d409c9a2d6e
2018-08-07 16:13:33 -07:00
66b3bae47c Add sizesLegacyNoScalars/stridesLegacyNoScalars analog of sizeLegacyN… (#10323)
Summary:
…oScalars,strideLegacyNoScalars.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10323

Differential Revision: D9200567

Pulled By: gchanan

fbshipit-source-id: 5580d6f92eef0acb04132f1978436cc31cdf563a
2018-08-07 15:41:28 -07:00
b7bc327180 Remove new_Tensor and generated components
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10194

Differential Revision: D9160559

Pulled By: cpuhrsch

fbshipit-source-id: 133185b3d4258c154dc43f7572dbef6bfa6786f3
2018-08-07 15:09:38 -07:00
5390476297 Add tracing to custom op and simplify tracer overall (#10212)
Summary:
This PR adds tracing infrastructure for custom operators. It also simplifies the tracer overall, and changes the codegen to do more metaprogramming there instead of via C++ (which was necessary for the custom op tracing).

To give an example of the tracer/metaprogramming change, what used to look like this in `VariableType.cpp`:

```
jit::tracer::PreTraceInfo trace_info;
  if (jit::tracer::isTracing()) {
    trace_info = jit::tracer::preRecordTrace(jit::aten::index_select, "self", self, "dim", dim, "index", index);
  }
```

is now simply the inlined version of `preRecordTrace`, minus C++ metaprogramming:

```
torch::jit::Node* node = nullptr;
  if (jit::tracer::isTracing()) {
    auto& graph = jit::tracer::getTracingState()->graph;
    node = graph->create(jit::aten::index_select_out, /*outputs=*/0);
    jit::tracer::recordSourceLocation(node);
    jit::tracer::addInputs(node, "result", result);
    jit::tracer::addInputs(node, "self", self);
    jit::tracer::addInputs(node, "dim", dim);
    jit::tracer::addInputs(node, "index", index);
    graph->appendNode(node);
  }
```

zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10212

Differential Revision: D9199615

Pulled By: goldsborough

fbshipit-source-id: cd4b603c1dc01340ead407228e109c99bdba2cfc
2018-08-07 13:54:15 -07:00
5bb21493fd add fused dropout kernels (#9666)
Summary:
While waiting for dropout to be fully ported to ATen, here's performance fix for the most common dropout case. Dropout is still in python function, I just added efficient path to it. I could not make inplace work, because generator always generates `return self` for inplace function, and I need to return both original tensor and mask, so inplace goes on the existing pass. Even with non-inplace version, since mask is now a ByteTensor, memory used is just a little larger than for inplace dropout, due to savings on mask.
Once dropout is moved to aten, these kernels still can be used for efficient implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9666

Reviewed By: SsnL

Differential Revision: D8948077

Pulled By: ezyang

fbshipit-source-id: 52990ef769471d957e464af635e5f9b4e519567a
2018-08-07 13:34:53 -07:00
74979495f0 Optional input lengths in CTC op (#10228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10228

Sometimes, for all items in the minibatch in test mode, input length will be
equal to max time steps. This avoids having to pass in an external tensor.

Differential Revision: D9174378

fbshipit-source-id: 22f7d5c311c855d9c3ac59f2a5e773279bd69974
2018-08-07 13:34:51 -07:00
9b1a65bec3 Extends type and shape tracing with device (#9796)
Summary:
This PR extends the existing type and shape metadata tracing and verification done in autograd with device information. This expansion of tracing is required for #8354, is likely useful in other scenarios, and is a healthy sanity check, just like type and shape tracing.

The precise changes are:

- TypeAndShape -> InputMetadata, now includes device()
- Creating InputMetadata is simplified to just require a tensor, and callers were updated to use this simpler invocation wherever possible
- The gradient accumulator of a variable is now reset when set_data() is called if either the type or device changes, and this reset now locks to avoid contention with acquiring the gradient accumulator
- Mismatched devices during backward() will throw a runtime error, just like mismatched type and shape
- (Bonus!) Two uninitialized pointers in THCReduce are now initialized (to nullptr) to prevent build warnings

fyi colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9796

Reviewed By: goldsborough

Differential Revision: D9119325

Pulled By: ezyang

fbshipit-source-id: 76d1861b8d4f74db0575ff1f3bd965e18f9463de
2018-08-07 12:25:17 -07:00
2993c42ee4 Squash some 'invalid escape sequence' warnings. (#10310)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10310

Differential Revision: D9196254

Pulled By: ezyang

fbshipit-source-id: 63bb8e52ac6970fe8e11a2d3c491ab58250dc467
2018-08-07 12:25:15 -07:00
db7a2b1f0d fix doc for as_tensor (#10309)
Summary:
- fixes #9914
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10309

Differential Revision: D9196427

Pulled By: weiyangfb

fbshipit-source-id: c9a01e42c2e9dbfe2bd94ad14651d9f578751de2
2018-08-07 11:24:45 -07:00
dcaafdd04b fix doc of sparse_coo_tensor (#10308)
Summary:
- fixes #9998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10308

Differential Revision: D9196423

Pulled By: weiyangfb

fbshipit-source-id: 23b4ed96e354ac9aa7c268aad105818a2c6d3bd8
2018-08-07 11:24:44 -07:00
20a549b101 Start using a newer version of rocRand that's PyTorch compatible.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10280

Differential Revision: D9196349

Pulled By: Jorghi12

fbshipit-source-id: 4147f2e6e3fdd641b026f3761d684437591405be
2018-08-07 11:09:59 -07:00
fe68879832 Fix dir(torch) for python 3.7 (#10271)
Summary:
fixes #10160.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10271

Differential Revision: D9188031

Pulled By: li-roy

fbshipit-source-id: a3620553a8ba2b7391acdf78dbe58afcdb6c5f7f
2018-08-07 09:57:51 -07:00
ad76fc8807 s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275

Remove forwarding declaration in caffe2/core/common.h

```
codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN
```

Reviewed By: mingzhe09088

Differential Revision: D9184809

fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8
2018-08-07 08:54:26 -07:00
66f7b8abbe Better macro name hygiene prefixing. (#10274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10274

Good C++ libraries don't take up un-namespaced identifiers
like DISABLE_COPY_AND_ASSIGN.  Re-prefix this.

Follow up fix: codemod Caffe2 to use the new macro, delete
the forwarding definition

Reviewed By: mingzhe09088

Differential Revision: D9181939

fbshipit-source-id: 857d099de1c2c0c4d0c1768c1ab772d59e28977c
2018-08-07 08:54:24 -07:00
18e298305e Increase TCP listen queue size from 64 to 1024 (#10268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10268

Running torch.distributed.init_process_group fails with more than ~64 processes, with various errors like connection refused or connection reset by peer. After some digging, it looks like the root cause is that all workers have to connect to master via TCP (both in Zeus init and in DataChannelTCP - look for `connect()`), and the listening socket only has a backlog of 64.

I increased the backlog to 1024, that seems like enough for reasonable purposes (the hard limit is 65535 in /proc/sys/net/core/somaxconn). There's probably a more correct way to do this that involves retries when connection is refused.

Reviewed By: soumith

Differential Revision: D9182216

fbshipit-source-id: 2f71c4995841db26c670cec344f1e3c7a80a7936
2018-08-07 08:26:06 -07:00
1a797ec810 Revert "clean up the build a bit. We no longer need the separate buil… (#10285)
Summary:
…d_libtorch entrypoint (#9836)"

This reverts commit 62e23a1ee47eb66056e6695cefef4e42599f8bd0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10285

Differential Revision: D9193107

Pulled By: ezyang

fbshipit-source-id: de96dce12fdf74410413ae18feee5caf0bed0025
2018-08-07 07:40:20 -07:00
b6402648f4 fix off-by-one bug in open-ended slicing (#10286)
Summary:
Previously, `tensor[i:]` was transformed to `tensor[i:-1]`. This incorrectly leaves off the last element. Noticed this when implementing slicing for list types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10286

Differential Revision: D9193292

Pulled By: michaelsuo

fbshipit-source-id: df372b815f9a3b8029830dd9e8769f9985a890e7
2018-08-07 00:39:42 -07:00
5a7c710548 Support some basic list operations (#10225)
Summary:
Support a few basic operators:
- eq
- add
- len
- select (indexing)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10225

Differential Revision: D9172338

Pulled By: michaelsuo

fbshipit-source-id: 6e75ec1453b9589b0fb4698598ecdba5a5fccff9
2018-08-07 00:39:40 -07:00
1bae6e24c9 Change empty list literal compiler error to match actual builtin name (#10265)
Summary:
I changed the name of this builtin to match Python's native style, but forgot to change the compiler error to match.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10265

Differential Revision: D9192963

Pulled By: michaelsuo

fbshipit-source-id: 225ca4cd50fbbe3b31c369deeb3123a84342aab1
2018-08-07 00:39:39 -07:00
fa9ea5bde9 Move CoreAPI.h to Macros.h, to give it a more accurate name. (#10264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10264

Since we now have DISABLE_COPY_AND_ASSIGN macro in the file,
CoreAPI is no longer an accurate name.

Reviewed By: dzhulgakov

Differential Revision: D9181687

fbshipit-source-id: a9cc5556be9c43e6aaa22671f755010707caef67
2018-08-06 22:27:44 -07:00
da44cf6101 Move TensorTypeId, TensorTypeIdRegistration and flat_hash_map to ATen/core (#10263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10263

Auxiliary changes that were needed:
- Add DISABLE_COPY_AND_ASSIGN to CoreAPI.h (maybe we should rename this file
  now)

Reviewed By: dzhulgakov

Differential Revision: D9181321

fbshipit-source-id: 975687068285b5a94a57934817c960aeea2bbafa
2018-08-06 22:27:40 -07:00
f1cf3105de Revert D9169049: [pytorch][PR] Add new mkldnn fallback operators
Differential Revision:
D9169049

Original commit changeset: 3bc30250d734

fbshipit-source-id: 65a91594bda699ff9535b27dccd0d1e5d1a8036a
2018-08-06 20:39:30 -07:00
f47bec821e Add new mkldnn fallback operators (#10162)
Summary:
Add new ideep fallback operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10162

Reviewed By: yinghai

Differential Revision: D9169049

Pulled By: wesolwsk

fbshipit-source-id: 3bc30250d7340fea2c442f36d16b85241ceee6e7
2018-08-06 16:56:00 -07:00
25b2e88750 Stop propagating std flags to downstream gcc/nvcc (#10098)
Summary:
When we directly use -std=c++11, it propagates to the downstream applications.

Problems:
1. Gcc flags propagating to nvcc.
2. nvcc flags propagating to nvcc. (Which throws an error like redeclaration of std flag)

This PR will fix these propagation issues!

Similar problem:
https://github.com/FloopCZ/tensorflow_cc/pull/92
https://github.com/CGAL/cgal/issues/2775

Requires: Cmake 3.12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10098

Differential Revision: D9187110

Pulled By: ezyang

fbshipit-source-id: 0e00e6aa3119c77a5b3ea56992ef3bbfecd71d80
2018-08-06 15:30:27 -07:00
8b08eca203 Move ScalarType to ATen/core, splitting out Backend
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10262

Reviewed By: dzhulgakov

Differential Revision: D9157408

fbshipit-source-id: 11631a35dfc6cb1f73f61ea08d3115f8ef4cb034
2018-08-06 15:30:25 -07:00
a38b572de3 enable unit tests and other changes (#10266)
Summary:
This PR for the ROCm target does the following:
* enable some unit tests on ROCm
* fix a missing static_cast that breaks BatchNorm call on ROCm
* fix BatchNorm to work on ROCm w/ ROCm warp sizes etc
* improve the pyhipify script by introducing kernel scope to some transpilations and other improvements
* fix a linking issue on ROCm
* for more unit test sets: mark currently broken tests broken (to be fixed)
* enable THINLTO (phase one) to parallelize linking
* address the first failing of the elementwise kernel by removing non-working ROCm specialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266

Differential Revision: D9184178

Pulled By: ezyang

fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297
2018-08-06 14:54:01 -07:00
e0d43572c1 Cleaner semantics for Reserve (#10261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10261

1. Reserve
Currently, Reserve will allocate new memory and old data in the tensor is also preserved,
and Resize is relying on this behavior in some call-site, e.g. https://github.com/pytorch/pytorch/blob/master/caffe2/operators/reservoir_sampling.cc#L103, where we should be using Extend.
We want to bring semantics of Reserve to be more aligned with std::vector, i.e. we want it to be
an optimization about memory allocation and remove the semantics about preserving the data. We'll remove the guarantee that data will be preserved after Reserve, and Extend will be the only API that preserves old data when we do in-place extension of memory. This also helps with the later refactoring on split Storage from Tensor.
Also, we'll only pass in the outer dimension to Reserve which means the later dimensions should be set before we call Reserve.
2. Extend/Shrink
Previously, Extend actually means ExtendBy and Shrink means ShrinkTo, I would like to add a ExtendTo for convenience, and change Shrink to ShrinkTo.
Old functions calling Extend is still there, although it actually means Extend by, but I think it still makes sense to have it.
3. Usage Patterns

The expected usage patterns right now is:
```
t->Resize({0, 32, 32, 32});
t->template mutable_data<T>(); // set meta_
t->Reserve(100);
auto* t_data = t->template mutable_data<T>();
// feed data to tensor using t_data
for (int i = 0; i < 100; ++i) {
  t->Extend(1, 50, &context_);
  // you can continue to use t_data if you have reserved enough space
  // otherwise, you should call t->template mutable_data<T> again to
  // get the new data pointer since Extend will allocate new memory even
  // though the original data is preserved.
}
```

Reviewed By: ezyang

Differential Revision: D9128147

fbshipit-source-id: e765f6566d73deafe2abeef0b2cc0ebcbfebd096
2018-08-06 14:40:16 -07:00
a13a53c151 Optimize group_norm on cpu (#10246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10246

Optimize group_norm on cpu

Reviewed By: houseroad

Differential Revision: D9177878

fbshipit-source-id: 41f7aadc6336317c338c75daccef6cb98e9de9de
2018-08-06 14:26:09 -07:00
0c848f4179 Python integration for custom operators (#10149)
Summary:
Adds the Python path to custom operators, including dynamically loading operations into Python.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10149

Reviewed By: ezyang

Differential Revision: D9158380

Pulled By: goldsborough

fbshipit-source-id: 3edffa639e8d2959e9e80d1bd4f20ab4a1b3ca02
2018-08-06 13:54:48 -07:00
62e23a1ee4 clean up the build a bit. We no longer need the separate build_libtorch entrypoint (#9836)
Summary:
the new entrypoint is `./tools/build_pytorch_libs.sh caffe2`

this will also speed up CI builds a bit, since we will no longer be compiling all of libtorch twice
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9836

Differential Revision: D9182634

Pulled By: anderspapitto

fbshipit-source-id: 0b9a20ab04f5df2d5c4e7777e4dc468ab25b9ce2
2018-08-06 13:41:51 -07:00
d1a0c2eaf8 Add back THTensor_nDimension. (#10259)
Summary:
Turns out some people are using this via the C-API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10259

Differential Revision: D9180135

Pulled By: gchanan

fbshipit-source-id: 68f59beabf7f8093e67581d7e7ebfe8dff9e6b69
2018-08-06 11:09:41 -07:00
6ac35b35d1 Stop using THLongStorage for sizes/strides, remove THLongStorageView.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10219

Reviewed By: cpuhrsch

Differential Revision: D9159550

Pulled By: gchanan

fbshipit-source-id: 745a6d335613688ed41b32369ee4938907ce8cbb
2018-08-06 09:25:32 -07:00
835a5d4f49 Add cost inference of fwd sparse operators and sparse adagrad (#9314)
Summary:
We should also add cost inference for sparse operators in backward pass later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9314

Reviewed By: orionr

Differential Revision: D8789240

Pulled By: jspark1105

fbshipit-source-id: 68c2170f294fe13bcc409276f599b5fa8a98bcd3
2018-08-06 08:39:16 -07:00
506142ac8a Add warning for building PyTorch using Python 2.7 on Windows (#10247)
Summary:
Fixes #9232.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10247

Differential Revision: D9178257

Pulled By: SsnL

fbshipit-source-id: cc553335a5a918b6d77fe1064460cb66114859ca
2018-08-05 21:24:02 -07:00
267c397c5b Add the ocr_det model for benchmarking (#10245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10245

as title

Reviewed By: sf-wind

Differential Revision: D9176654

fbshipit-source-id: 3339d2aa6a0ceb0e751745c06dcfd025ccbf5449
2018-08-05 16:45:35 -07:00
7f2e43a084 Add the ocr_rec model json (#10240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10240

as title

Reviewed By: sf-wind

Differential Revision: D9176522

fbshipit-source-id: 5b92c0b4ed24f96fe7b1321a3ab5ad26dcd3318d
2018-08-05 16:45:23 -07:00
df23bdc82d add BEGIN NOT-CLEAN-FILES marker to .gitignore. (#10233)
Summary:
Using Visual Studio Code and Visual Studio, these IDEs store configurations to `FOLDER/.vscode` and `FOLDER/.vs`.
But "setup.py clean" deletes these folders because those are described in `.gitignore` file.

To prevent this, add "BEGIN NOT-CLEAN-FILES" marker to `.gitignore` file and "setup.py clean" ignores lines after this marker.

Discussed in #10206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10233

Differential Revision: D9175515

Pulled By: ezyang

fbshipit-source-id: 24074a7e6e505a3d51382dc5ade5c65c97deda37
2018-08-05 15:55:44 -07:00
f57e4ce1d5 Update broadcast with alpha to reduce num of launching kernels. (#10235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10235

Update broadcast with alpha to reduce num of launching kernels.

Reviewed By: houseroad

Differential Revision: D9175824

fbshipit-source-id: 7a463833350a2c84dcfb82f73cf40da403dd59a0
2018-08-04 19:54:20 -07:00
ab293924bb support generic feature in DPER2 (#10197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10197

Support generic feature in DPER2

For now since we only have one generic type 1, we are directly adding the parsed feature record to embedding feature.

For new feature types with specific structure, there should also be corresponding coding changes expected.

Reviewed By: itomatik

Differential Revision: D8788177

fbshipit-source-id: 9aaa6f35ece382acb4072ec5e57061bb0727f184
2018-08-04 15:25:13 -07:00
57d2d4bcff Optimize reduce ops for 2d and 3d (#9992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9992

Optimize reduce ops for 2d and 3d

Reviewed By: houseroad

Differential Revision: D9042505

fbshipit-source-id: 62af2125aa6439106293e59bdf6a2b920792fd2d
2018-08-04 13:53:58 -07:00
29406a2c4c Fix shared_ptr refcycle in graph executor (#10222)
Summary:
Fixes #10032

When capturing an output, GraphExecutorAutogradFunction creates
SavedVariable with is_output=False and owns it:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/graph_executor.cpp#L87

Constructing SavedVariable with is_output=False makes it own a copy of
the shared_ptr<GraphExecutorAutogradFunction>, which causes a reference
cycle:
6456b944fd/torch/csrc/autograd/saved_variable.cpp (L27)

The solution in this PR is to construct the SavedVariable with
is_output=True if the captured value is an output.

Test Plan

Turn on cuda memory checking for JitTestCase. If the test's name
includes "cuda" or "gpu" in it, the cuda memory checking test happens.

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10222

Reviewed By: ezyang

Differential Revision: D9162995

Pulled By: zou3519

fbshipit-source-id: aeace85a09160c7a7e79cf35f6ac61eac87cbf66
2018-08-04 11:39:10 -07:00
2141cb7d53 Update OnnxifiOp to reflect onnx/onnx#1256
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10230

Reviewed By: yinghai

Differential Revision: D9174527

Pulled By: Maratyszcza

fbshipit-source-id: 753493e67446b528d65b146e89ea9f874b469ead
2018-08-04 08:09:19 -07:00
5df8547ff9 Fix ONNX LogSoftmax export. (#9576)
Summary:
This fixes an issue with incorrect `axis=-1` in the exported ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9576

Reviewed By: yinghai

Differential Revision: D9125463

Pulled By: houseroad

fbshipit-source-id: 6f4cb1067d1aa6bb0a9f56690fc21816c98eebfa
2018-08-03 22:09:42 -07:00
36939417b2 Introduce at::DeviceType, which subsumes at::Device::Type and (partially) caffe2::DeviceType (#10175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10175

Previously, we had at::Device::Type and caffe2::DeviceType (from protobuf),
intended to help us distinguish between CPU, CUDA, etc. devices.

This replaces at::Device::Type entirely with at::DeviceType, which in turn
is a direct, 'enum class' version of the protobuf generated caffe2::DeviceType
'enum'.  We can't eliminate the 'enum' because this would a pretty drastic
API change (enum is interconvertible with integers, enum class is not) but
we can make the two line up exactly and share code for, e.g., printing.

Reviewed By: Yangqing

Differential Revision: D9137156

fbshipit-source-id: 566385cd6efb1ed722b25e6f7849a910b50342ab
2018-08-03 19:25:06 -07:00
98d60ad43d Replace caffe2::EnforceNotMet with at::Error
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10184

Reviewed By: dzhulgakov

Differential Revision: D9140095

fbshipit-source-id: 3beead825609cec5054347e59903b0b78ef150f8
2018-08-03 19:25:05 -07:00
e2976ea519 Make at::Error look more like caffe2::EnforceNotMet (#10183)
Summary:
- New concept of a message stack; you can add messages
  using AppendMessage
- New concept of a caller; it's just a way to pass along
  some arbitrary extra information in the exception

Coming soon is changing Caffe2 to use at::Error instead of
EnforceNotMet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10183

Differential Revision: D9139996

Pulled By: ezyang

fbshipit-source-id: 6979c289ec59bc3566a23d6619bafba2c1920de9
2018-08-03 19:25:03 -07:00
c7c6e93312 Use target_compile_definitions for AT_CORE_STATIC_WINDOWS (#10213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10213

nvcc only respects definitions, not options.

Reviewed By: dzhulgakov

Differential Revision: D9154388

fbshipit-source-id: 04c4809154df1c61108b65f1115fccdeb336952e
2018-08-03 19:25:02 -07:00
02a64b183c Move ATenGeneral back out of core. (#10224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10224

It doesn't work with Caffe2; use AT_CORE_API from ATen/core/CoreAPI.h
instead.

Reviewed By: smessmer

Differential Revision: D9162467

fbshipit-source-id: 3c7d83c1ccb722ebac469296bdd7c3982ff461e5
2018-08-03 19:25:01 -07:00
41dce17e22 Delete TensorImpl::type_, replace with backend_/scalar_type_/is_variable_ (#10210)
Summary:
The basic game plan is to stop accessing the type_ field directly,
and instead using the stored backend_, scalar_type_ and
is_variable_ to look up the appropriate Type from Context.
Storage of backend_ and scalar_type_ are new.

At some future point in time, I'd like to look at this code
carefully to see if I can get everything in this codepath inlining.
I didn't do it in this patch because there are circular include
problems making things difficult.

Some other details:

- Added Device::backend() which does what it says on the tin

- SparseTensorImpl is temporarily hard-coded to root in at::Context
  for the appropriate context.  If/when we put this in shared code,
  we'll have to break this dep too, but for now it should be OK.

- There's a stupid problem with globalContext() deadlocking if
  you didn't actually initialize it before loading libtorch.so
  (which is bringing along the variable hooks).  I fixed this by
  reordering the static initializers. Fixes #9784

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10210

Differential Revision: D9150697

Pulled By: ezyang

fbshipit-source-id: 89e2006c88688bcfab0dcee82dc369127c198c35
2018-08-03 18:25:19 -07:00
149d4f776b use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965)
Summary:
- fixes #9141, #9301
- use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion)
- return (N) instead of (N, C) to match the same behavior as MultiMarginLoss
- Note that with this PR, the following behavior is expected:
```
loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none')
loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean')
loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum')

loss.sum() == loss_sum  # True
loss.mean() == loss_mean  # True
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965

Differential Revision: D9038402

Pulled By: weiyangfb

fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9
2018-08-03 17:54:19 -07:00
7bc87172ea Kill Tensor::shares_data (#10217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10217

It's only used in debug printing and is not that reliable anyway. If we want to implement it later - we should do it proper accounting for shared storages.

Reviewed By: jerryzh168

Differential Revision: D9155685

fbshipit-source-id: 48320d41a0c4155645f3ba622ef88730a4567895
2018-08-03 17:40:39 -07:00
3b3aff2ed6 IsType<TensorCPU> -> IsType<Tensor>(CPU) (#10135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10135

att

Reviewed By: yinghai

Differential Revision: D9121892

fbshipit-source-id: 4a4a3bfc450896b619bf92c92ef218aaaefc3081
2018-08-03 17:24:59 -07:00
4aa7469d1f Implement c10 ops needed for benchmark (#9360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9360

This implements a first set of c10 operators, namely the ones needed for the multithread predictor benchmark.

All implementations are CPU-only and experimental. They're not meant to be used in production.

They can be used, however, to test calling simple c10 MLPs from Caffe2 or PyTorch when working on these integration paths.

Reviewed By: dzhulgakov

Differential Revision: D8811698

fbshipit-source-id: 826789c38b2bfdb125a5c0d03c5aebf627785482
2018-08-03 16:09:27 -07:00
08e7af20d3 Implement calling of c10 ops from c2 (#9369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9369

This adds the capability for caffe2 to call c10 operators and adds a dummy c10 sigmoid op as a proof of concept.

I used this test script to make sure it works:

    from caffe2.python import workspace, model_helper
    import numpy as np

    data1 = np.random.rand(16, 100).astype(np.float32)
    workspace.FeedBlob("data1", data1)
    m = model_helper.ModelHelper(name="my net")
    sigmoid1 = m.net.C10Sigmoid_DontUseThisOpYet("data1", "sigmoid1")
    sigmoid2 = m.net.Sigmoid("data1", "sigmoid2")

    workspace.RunNetOnce(m.param_init_net)
    workspace.CreateNet(m.net)
    data1 = np.random.rand(16, 100).astype(np.float32)
    workspace.FeedBlob("data1", data1)
    workspace.RunNet(m.name, 1)

    print(workspace.FetchBlob("data1"))
    print(workspace.FetchBlob("sigmoid1"))
    print(workspace.FetchBlob("sigmoid2"))

(and check that both sigmoid outputs are the same)

Reviewed By: ezyang

Differential Revision: D8814669

fbshipit-source-id: eeb0e7a854727f1617a3c592a662a7e5ae226f40
2018-08-03 16:09:23 -07:00
c5abe8844a Add IDEEP fallbacks for Resnet50 training ops (#8541)
Summary:
1. Add fallback gradient ops
2. In fallback ops, set the output Tensor as CPUTensor instead of IDEEPTensor if ndim = 0. Because IDEEPTensor doesn't support 0 dim.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8541

Reviewed By: yinghai

Differential Revision: D9115233

Pulled By: wesolwsk

fbshipit-source-id: 163e6a76f02bd781c95d1060ccbacf2cab90055e
2018-08-03 15:54:17 -07:00
4680ab4d44 Generalize intrusive_ptr comment (#10216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10216

-

Reviewed By: ezyang

Differential Revision: D9155601

fbshipit-source-id: 154de2e6ad747134413a3ab3ae0b7507b8284d49
2018-08-03 14:25:28 -07:00
97cbcb7d67 Allow releasing/retaining weak_intrusive_ptr (#10214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10214

Seems we're passing weak pointers over C API boundaries. Need this API there too.

Reviewed By: ezyang

Differential Revision: D9154505

fbshipit-source-id: c9889689b87dad5d918f93ba231e01704b8d2479
2018-08-03 14:25:24 -07:00
6456b944fd ctc_loss odds and ends (#10112)
Summary:
- Add convenience wrapper to pass tensors as input_lengths, target_lengths
- Fix documentation example
- Check BLANK >= 0

Thank you, Simon and Soumith for the suggestions!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10112

Differential Revision: D9130737

Pulled By: SsnL

fbshipit-source-id: f9a0022a969788bda3db9f360e2564b519ebf2e6
2018-08-03 13:25:18 -07:00
65d32b1705 Remove unused substitutions (#10187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10187

These substitutions don't actually occur in the target file. Remove them.

Reviewed By: ezyang

Differential Revision: D9141567

fbshipit-source-id: fcfddee0b4d31e21763b39d852577d2dbb9ce843
2018-08-03 12:25:59 -07:00
f51f15bb27 Update include paths for ATen/core (#10130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10130

Update some include paths to make them internally consistent

Reviewed By: ezyang

Differential Revision: D9119906

fbshipit-source-id: b44e5cab8e8e795ee18afe9ffc6caf1f2b413467
2018-08-03 11:57:02 -07:00
f77b62c3e1 Add documentation for margin arg in Caffe2 MarginRankingCriterionOp (#10186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10186

The MarginRankingCriterionOp margin argument was undocumented.

Reviewed By: jerryzh168

Differential Revision: D9141228

fbshipit-source-id: 724d45dc8e555fbe9d3e8afc7b6bf8ed17bbbdb1
2018-08-03 11:45:51 -07:00
cb0e72e00d Add registerOperator overloads that infer the schema (#10048)
Summary:
This PR adds a way to infer the JIT/script schema of a function from its signature, and then create an operator from the schema and implementation. The implementation function is wrapped into another function, which pops values from the stack into an argument tuple, then invokes the function and pushes the return value back onto the stack, sometimes unpacking the return value if it is a tuple.

Currently the method is called `createOperator`. We may want to think of a nicer way of registering ops in tandem with `RegisterOperators`. It might be very cumbersome to add a template constructor to `Operator`, so maybe we can come up with a chaining method on `RegisterOperators` like `RegisterOperators(schema, func).op(schema.func).op(schema, func)` -- it has to work at startup time (for a static variable) though. We can solve this in another PR.

zdevito apaszke smessmer dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10048

Differential Revision: D9125975

Pulled By: goldsborough

fbshipit-source-id: de9e59888757573284a43787ae5d94384bfe8f9a
2018-08-03 11:45:49 -07:00
7a377b9a53 Add torch.argsort mirroring similar functionality in numpy. (#9600)
Summary:
Per issue #9542
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9600

Differential Revision: D8952338

Pulled By: resistor

fbshipit-source-id: c3f69d62858ad9458ec5ae563e3ff24b1c9283a7
2018-08-03 11:45:47 -07:00
c91af1202a Make release_resources non-const (#10192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10192

- release_resources() method must be non-const because it modifies the object
- for intrusive_ptr<const MyClass>, this needs to be const_cast :(

Reviewed By: ezyang

Differential Revision: D9143808

fbshipit-source-id: 9203ff7a7ff3bec165931279371c6e75d4f0ca8c
2018-08-03 11:24:45 -07:00
39476d79a2 Allow releasing/reclaiming intrusive_ptr (#10133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10133

This is useful for C APIs where we want to give owning pointers to/from other languages.

Reviewed By: ezyang

Differential Revision: D9121493

fbshipit-source-id: f903f5830f587b2ba69c0636ddcf1a066bbac2e0
2018-08-03 11:24:43 -07:00
5753746d29 Enable static initializer order ASAN. (#10211)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10211

Differential Revision: D9150687

Pulled By: ezyang

fbshipit-source-id: 4cd458d19a34788c8897905a87d1b52229f67f90
2018-08-03 11:24:42 -07:00
4a6fbf03c6 Make StorageImpl member variables largely private and use getters and setters
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10074

Differential Revision: D9086887

Pulled By: cpuhrsch

fbshipit-source-id: d2dd0d6a1b71d0f864aefb64cd1daefd11dcfb91
2018-08-03 11:10:02 -07:00
50cf326158 Allow type cast between int and float in Script (#10168)
Summary:
The PR allows int→float and float→int casts. Current we only allow `tensor→int` and `tensor→float` casts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10168

Differential Revision: D9141163

Pulled By: wanchaol

fbshipit-source-id: 5e5591a98b4985a675641dfc9a385b2a0bf8e208
2018-08-03 10:56:05 -07:00
5d3782b655 Fix IDEEP Copys (#10104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10104

.

Reviewed By: yinghai

Differential Revision: D9109638

fbshipit-source-id: 319cc5711132314dfba0f09ac403522f21ad532b
2018-08-03 10:31:32 -07:00
656bb320b7 EnforceFinite test (#10143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10143

att

Reviewed By: xianjiec

Differential Revision: D9122444

fbshipit-source-id: 010abcc1eb64f084c00890e8de5f5d422b4b8d02
2018-08-03 10:31:29 -07:00
13de6e8dfa Make list literals construct ListType (#10193)
Summary:
Previously, `foo = [bar, baz]` would construct a TupleType of fixed arity. This would cause code like:
```
foo = [2]
if True:
    foo = [2, 2]
```
to fail to compile, since `(int)` is not the same as `(int, int)`.

This PR changes things so that list literals construct ListTypes, which can be resized.

Potentially breaking changes introduced:
- Empty list literals are now disallowed, `_constructEmptyFooList()` builtins are required to replace them.
- Iterable variable unpacking where the rhs is a list is now disallowed. (Tuples still work)
- Lists must have a single type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10193

Differential Revision: D9147166

Pulled By: michaelsuo

fbshipit-source-id: bbd1b97b0b6b7cb0e6f9d6aefa1ee9c731e63039
2018-08-03 00:55:23 -07:00
ab0ac6391b fix padding doc not rendered correctly (#10196)
Summary:
somehow sphinx doesn't like the previous wording
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10196

Differential Revision: D9146817

Pulled By: SsnL

fbshipit-source-id: 2140859bc363af556a021658def946d7afbdb245
2018-08-02 23:26:45 -07:00
4778afb8bb In Expand support using -1 to indicate preserving original size (#10174)
Summary:
zrphercule

https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10174

Differential Revision: D9136467

Pulled By: bddppq

fbshipit-source-id: 825c489899097acda8d43706964d78a104cdf583
2018-08-02 22:09:47 -07:00
dd527db711 Skip TestConvolution.test_convolution_sync on ROCM which caused random segfaults (#10179)
Summary:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/4701/console

petrex ashishfarmer rohithkrn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10179

Differential Revision: D9139657

Pulled By: bddppq

fbshipit-source-id: 9b1bb2ad185ed16fff696ce026a5ee5fcf9cbaee
2018-08-02 21:09:27 -07:00
1f78e06f63 Add g.insertConstant and clean up dead attributes code (#10177)
Summary:
* Changes `insertConstant(g, val)` to `g.insertConstant(val)`.
* Moves SourceRange to its own file to enable it.
* Cleans up dead attribute code in schema matching and graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10177

Differential Revision: D9137789

Pulled By: zdevito

fbshipit-source-id: 8a73cfb01a576f02e7e4dce019be9c0a0002989d
2018-08-02 20:45:31 -07:00
798b530361 weak_intrusive_ptr (#10038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10038

Add weak_ptr ability to intrusive_ptr.

Reviewed By: ezyang

Differential Revision: D9039980

fbshipit-source-id: dd504d6e0d7acf5914cd45845355e28f9df201fb
2018-08-02 17:25:14 -07:00
2bd709a7c8 intrusive_ptr (#9897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9897

Add an IntrusivePtr class to do intrusive refcounting with a shared_ptr-like interface.

Reviewed By: ezyang

Differential Revision: D9018619

fbshipit-source-id: 5de8706aab8eea2e30bead0f59bd6a7ca4d20011
2018-08-02 17:25:12 -07:00
0e9c6898cb Export modules in ir with google protobuf
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9746

Differential Revision: D9110006

Pulled By: li-roy

fbshipit-source-id: 8b9744c042f822fdfe959a7a7fef3d0baff4f639
2018-08-02 15:54:51 -07:00
e2ecf3914a Change default CUDA block size from 512 to 128 (#10090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10090

Decreasing the block size improves GPU utilization for use cases with small input sizes (e.g. 10000)

Reviewed By: pjh5

Differential Revision: D9093573

fbshipit-source-id: c8f995b773a00b1bea3a3809c0f6557133efd9dd
2018-08-02 15:40:13 -07:00
7dc870bd7b Delete invalid 'template' keyword (#10173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10173

With D9024330, `Extend` fundtion is no more a template, which makes
the `template` keyword here invalid. For some reason current version of LLVM
doesn't catch this, but the latest one does.

Reviewed By: jerryzh168

Differential Revision: D9133462

fbshipit-source-id: 54ac9aad01f81b9b4e7b6e2864b8961478d2d860
2018-08-02 14:50:11 -07:00
dad6e8bb6c Remove capture specifiers in register_aten_ops when they're not needed. (#9669)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9669

Differential Revision: D8952335

Pulled By: resistor

fbshipit-source-id: 8fbbec7a7f55fbeeda3509cb3d339e1db90a53e6
2018-08-02 13:40:31 -07:00
94c67f1454 Replace storageimpl type with scalar_type and backend
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10097

Differential Revision: D9124287

Pulled By: cpuhrsch

fbshipit-source-id: c976abeeaaa085b972812c1a3270eb6aef0c0dca
2018-08-02 13:31:30 -07:00
538b15d13c Use PYTORCH_PYTHON to call generate_code.py (#10171)
Summary:
Probably fixes https://github.com/pytorch/pytorch/issues/8373#issuecomment-409994847
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10171

Differential Revision: D9135607

Pulled By: SsnL

fbshipit-source-id: 72f535875658c857621e41fd25c2174052714557
2018-08-02 12:54:14 -07:00
9e85a7a9de Back out "[pytorch][PR] [TENSOR MERGE] Delete type_ field from TensorImpl, replaced with backend_/scalar_typ…" (#10169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10169

Original commit changeset: 2b4d867abfdc

Reviewed By: pjh5, SsnL

Differential Revision: D9135216

fbshipit-source-id: d5c9f12c3a0f75df224c781e1cd1e323cdfbb0d5
2018-08-02 12:39:01 -07:00
7be071a829 Update onnx to onnx/onnx@2a3a226 (#10167)
Summary:
2a3a226a96
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10167

Reviewed By: houseroad

Differential Revision: D9134738

Pulled By: bddppq

fbshipit-source-id: 9d3fd3c04a584d5626146f174ac78cabfa0e5934
2018-08-02 12:25:19 -07:00
6e85112f12 Adding katex rendering of equations, and required edits to equations. (#8848)
Summary:
This fixes issue #8529.

- Adds Katex extension to conf.py and requirements.txt
- Fixes syntax differences in docs
- Should allow documentation pages to render faster
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8848

Reviewed By: soumith

Differential Revision: D8677702

Pulled By: goodlux

fbshipit-source-id: c4a832c5879e0eebcb14763b35a41663331ba23f
2018-08-02 12:25:17 -07:00
ee98533746 Fix compiler warnings on ignored const qualifiers
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10142

Reviewed By: yinghai

Differential Revision: D9125502

Pulled By: bddppq

fbshipit-source-id: 8043b2a05507a4707220fa820ab6cc486760a93e
2018-08-02 12:10:37 -07:00
5765549155 codemod -d caffe2 --extensions cc,h CaffeTypeId TypeIdentifier (#10166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10166

TypeIdentifier is still easy to codemod away from

Reviewed By: smessmer

Differential Revision: D9132840

fbshipit-source-id: bc83a8b17b2e7c19c9d2c9cfe5c7ce6ec1d8cec5
2018-08-02 11:54:30 -07:00
4a2f3cc45f Improve lars operator by applying clipping (#9905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905

This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate

Reviewed By: pjh5

Differential Revision: D9020606

fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f
2018-08-02 11:54:28 -07:00
a243e517fa Guard sizes/strides in TH/THC for scalars.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10145

Differential Revision: D9125791

Pulled By: gchanan

fbshipit-source-id: d0b8c88c49d7af85971a4531a63fd85a97bfbec7
2018-08-02 11:24:36 -07:00
170d29769b Strings lexing, parsing, implementation in print (#9324)
Summary:
This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print.  If they are encountered elsewhere a NYI exception will be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324

Reviewed By: jramseyer

Differential Revision: D8807128

Pulled By: eellison

fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078
2018-08-02 11:09:03 -07:00
230ca98d4b Remove THTensor_isSize. (#10146)
Summary:
This is part of the process of removing THLongStorage to represent sizes/strides.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10146

Differential Revision: D9126611

Pulled By: gchanan

fbshipit-source-id: b0d995a4c51dfd54bf76dcfee9a69f37f9d01652
2018-08-02 10:39:43 -07:00
9c818bfbc7 Refactor PythonValue types + use tryMatchSchema for PythonOp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10132

Differential Revision: D9121327

Pulled By: jamesr66a

fbshipit-source-id: 6d8bcf6b0dca54106cf9ed740bcff857062a03da
2018-08-02 10:26:58 -07:00
cfa05706ef ROCm contributions week 29 (#9653)
Summary:
In this changeset:
* improvements to `hipify-python.py`
* marking unit tests broken for ROCm
* reducing the number of jobs for the built to avoid out of memory issues
* switch to Thrust/cub-hip master for the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653

Differential Revision: D9117791

Pulled By: ezyang

fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352
2018-08-02 09:09:00 -07:00
70d47f92db Add support for rand_like op in fusion compiler (#9795)
Summary:
Enabled support for generating random numbers in fusion compiler. Currently a philox RNG implemented by Tensorflow is used, as the NVRTC couldn't resolve the curand.h header correctly. The two implementation should have exact same behavior according to our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9795

Differential Revision: D8999029

Pulled By: SsnL

fbshipit-source-id: f0d2616a699a942e2f370bdb02ac77b9c463d7b8
2018-08-02 08:55:25 -07:00
4a5cd4f6ab nomnigraph - new utility for graph transformation (#10081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10081

Add new utility that make it easier to write graph transformation. Callers now only need to take care of the actual transformation logic. The subgraph matching is simplified because callers only need to specify a simple construct for subtree matching criteria.

The utlity is SubgraphMatcher::replaceSubtree

Some notes:
- replaceSubtree takes a subtree matching criteria, and a lambda that takes a subtree root. It does't not handle any transformations itself. Callers should be responsible for the transformation part, including deleting all nodes in the matched subtree(s). We could enhance this to also handle the deletion part if it turns out to be useful.
- Only sub tree matching is supported for now but we can add general DAG sub-graph support later if needed.

Reviewed By: bwasti

Differential Revision: D9073297

fbshipit-source-id: 465a0ad11caafde01196fbb2eda2d4d8e550c3b6
2018-08-01 23:09:41 -07:00
acbc2744d8 fix bug in 3d group convolution (#9860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9860

For 3D group convolution, in the case of CUDNN 7 and NCHWD order, filter dim is (M, C/group_, k_h, h_w, k_d).

According to CUDA doc (https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#grouped-convolutions), the existing implementation is incorrect, and will crash the 3d video model training with group convolution.

In the implementation, `filter.dims(1)` is already `C/group_`. So don't need to divide it by `group_` again.

Reviewed By: BIT-silence

Differential Revision: D9008807

fbshipit-source-id: 2f0d6eb47f4e16d7417a7e3baeba709e3254154f
2018-08-01 22:55:38 -07:00
57061d600a Auto-batching IR transformation for control flow (#9392)
Summary:
Implement IR transformation for control flow

- `prim::Constant`: clone to new graph directly
- `prim::NumToTensor`: create a `BatchTensor` from output tensor with `batch_size = 1`
- `prim::TensorToNum`: clone to new graph
- `prim::ListConstruct`: clone to new graph
- `prim::If`: execute both `if_block` and `else_block` and combine results from them using `cond`
- `prim::Loop`:
  - for loop
  - while loop: change while `cond` to `cond_any`, use `cond` to update outputs

test case: hand-written LSTM, greedy search, beam search
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9392

Differential Revision: D8822369

Pulled By: ChunliF

fbshipit-source-id: 8f03c95757d32e8c4580eeab3974fd1bc429a1e5
2018-08-01 22:24:35 -07:00
8a25acbba5 Use angle brackets instead of quotes for includes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10153

Reviewed By: smessmer

Differential Revision: D9123768

fbshipit-source-id: 0970552ba4d5772fb3cef2db3af3181d98f85140
2018-08-01 22:02:51 -07:00
5699250acc Move IdWrapper to ATen/core (#10152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10152

- Moved from namespace c10::guts to at
- I fixed the use sites, since there were only three of them
- Macro renamed from C10_ to AT_

Reviewed By: smessmer

Differential Revision: D9123652

fbshipit-source-id: bef3c0ace046ebadb82ad00ab73371f026749085
2018-08-01 22:02:50 -07:00
8cc7d33656 Renumber typeid.h so that the number lines up with ScalarType (#10139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10139

We want CaffeTypeId to be interconvertible with at::ScalarType, and
this means we should have the numbers line up exactly.  Fortunately
this is not too hard to do.

Reviewed By: smessmer

Differential Revision: D9123058

fbshipit-source-id: 7e9bd59ca25a552afe9d2d0a16cedc4f6311f911
2018-08-01 22:02:46 -07:00
6b338c8026 Implement torch.broadcast_tensors (#10075)
Summary:
This exposes expand_outplace to python. Fixes #8076. Fixes #10041.

I didn't name it torch.broadcast because numpy.broadcast does something
slightly different (it returns an object with the correct shape
information).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10075

Differential Revision: D9125816

Pulled By: zou3519

fbshipit-source-id: ebe17c8bb54a73ec84b8f76ce14aff3e9c56f4d1
2018-08-01 19:18:34 -07:00
191482fa39 Distinguish TupleLiteral from ListLiteral (#10128)
Summary:
Previously, the parser was emitting list literals for tuples, but the IR was representing list literals internally with TupleTypes.

For implementing most list operations, I think it will be helpful distinguish between lists (dynamic size, homogeneous types) and tuples (fixed arity, heterogeneous types)

This diff modifies the parser logic to emit tuple literals. This frees us to represent lists as ListType in the IR, while still properly mapping tuple literals to TupleTypes.

A following diff will actually switch over list literals to emit ListTypes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10128

Differential Revision: D9121305

Pulled By: michaelsuo

fbshipit-source-id: e0cad07ae8bac680f7f8113d10e5129d5a1a511d
2018-08-01 19:18:31 -07:00
a44d9d6eb4 Fix tensor check logic in logging (#10138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10138

Note that `TensorCPU` and `TensorGPU` are all refined to be `Tensor` now. Basically they are the same thing. So check like `blob.IsType<TensorCPU>()` is no longer safe as `TensorGPU` can pass the check too.

We need to systematically weed out the such usage in our codebase... @[100008320710723:jerryzh]

Reviewed By: houseroad

Differential Revision: D9115273

fbshipit-source-id: 13b293c73691002eac34e095cdcd96c27183e875
2018-08-01 18:09:19 -07:00
24bb8cecbe Move ATen/Half to ATen/core, and apply lint (#10137)
Summary:
This rewrites checked_convert to use stringstreams, eliminating the use of to_string which is not available on Android stdc++.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10137

Reviewed By: smessmer

Differential Revision: D9122340

fbshipit-source-id: b7c1bff70e36217305f2b3333c51543ef8ff3d9c
2018-08-01 17:54:58 -07:00
806854a3c5 Pin AMD gpu id in Caffe2 CI (#10144)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10144

Differential Revision: D9125707

Pulled By: bddppq

fbshipit-source-id: 8ef8f3da6ceb1855f28fc24be621b9b4854ff7f9
2018-08-01 17:39:21 -07:00
59c355c870 Move halfbits2float and float2halfbits conversions to ATen. (#10134)
Summary:
This will be needed soon because I want to move Half.h into
ATen/core, and then I cannot have a TH dependency.

I also took the liberty of making the code more strict-aliasing
safe (this is not actually useful, since we will never built Torch
with strict aliasing) by replacing pointer casts between
float and unsigned with a memcpy instead.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10134

Differential Revision: D9121920

Pulled By: ezyang

fbshipit-source-id: 3b1f86a7c5880e8ac1a589a51f0635bb72e1fd40
2018-08-01 17:09:12 -07:00
4ed5b9267c #8518 Support for empty tuples (#10027)
Summary:
Fixing #8518

Sorry for the pile of commits; I forgot to rebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10027

Reviewed By: ezyang

Differential Revision: D9070028

Pulled By: jramseyer

fbshipit-source-id: 49729c9755ab8a586711e9f6d6a574f3035a7e75
2018-08-01 16:10:00 -07:00
1f6888b70a Allow mobile exporter to export string arrays (#10017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10017

Allow mobile exporter to export string arrays

Reviewed By: pjh5

Differential Revision: D9061213

fbshipit-source-id: b6c5257eb2f0f964dba255b97dc5d32af8ce15a7
2018-08-01 16:09:58 -07:00
1d427fd6f6 Delete type_ field from TensorImpl, replaced with backend_/scalar_typ… (#9787)
Summary:
…e_/is_variable_

The basic game plan is to stop accessing the type_ field directly,
and instead using the stored backend_, scalar_type_ and
is_variable_ to look up the appropriate Type from Context.
Storage of backend_ and scalar_type_ are new.

At some future point in time, I'd like to look at this code
carefully to see if I can get everything in this codepath inlining.
I didn't do it in this patch because there are circular include
problems making things difficult.

Some other details:

- Added Device::backend() which does what it says on the tin

- SparseTensorImpl is temporarily hard-coded to root in at::Context
  for the appropriate context.  If/when we put this in shared code,
  we'll have to break this dep too, but for now it should be OK.

- There's a stupid problem with globalContext() deadlocking if
  you didn't actually initialize it before loading libtorch.so
  (which is bringing along the variable hooks).  I didn't fix
  it in this PR; it's tracked in #9784

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9787

Reviewed By: cpuhrsch

Differential Revision: D8980971

Pulled By: ezyang

fbshipit-source-id: 2b4d867abfdc3999a836a220c638c109053145a8
2018-08-01 15:34:56 -07:00
edb90387b2 Lint ArrayRef.h (#10129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10129

-

Reviewed By: ezyang

Differential Revision: D9119933

fbshipit-source-id: dd13c6d2a0ab72d943acff5cb02b3278ca8c7ba6
2018-08-01 15:34:54 -07:00
080ae5ea1f Remove implicit ArrayRef -> vector conversion (#9740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9740

- Remove implicit ArrayRef -> vector conversion
- Fix 4 call sites that accidentally did an implicit expensive vector conversion but wouldn't have needed to
- Remove explicit vector conversion from 4 call sites that also didn't need to do that

Reviewed By: ezyang

Differential Revision: D8961693

fbshipit-source-id: 980da9f988083c0072497f9dbcbbf6f516fa311c
2018-08-01 15:34:52 -07:00
e2846c365a Improve ArrayRef (#9610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9610

Mostly making some stuff in ArrayRef constexpr to give it better perf.

Reviewed By: ezyang

Differential Revision: D8926785

fbshipit-source-id: af6d4b05fbc69d20855a80f3edc2b501577a742b
2018-08-01 15:34:50 -07:00
ad6d62250a Add torch.compiled_with_cxx11_abi(). (#10071)
Summary:
It returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1.

Fixes #8385
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10071

Differential Revision: D9088946

Pulled By: zou3519

fbshipit-source-id: b00fd92ee340ef34f60bdd6027ceaf46dd7442c0
2018-08-01 15:34:48 -07:00
1b1c47dfe5 Update onnx to onnx/onnx@32ac71b (#10126)
Summary:
32ac71b1b9
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10126

Reviewed By: houseroad

Differential Revision: D9120544

Pulled By: bddppq

fbshipit-source-id: 4fbe1f16e3b712c092f2f188324173ba1ecc1062
2018-08-01 14:28:54 -07:00
fb24c52dc3 Prepare TH for first class scalars (0-dimensional tensors).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10123

Differential Revision: D9121068

Pulled By: gchanan

fbshipit-source-id: 1cdc6e4b327cf158729cbb4026315be63b159f9d
2018-08-01 14:28:53 -07:00
2d56b5cf8b Prepare THC for first class scalars (0-dimensional tensors).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10072

Differential Revision: D9082421

Pulled By: gchanan

fbshipit-source-id: d4327b07aaef85cc2521393008154ebceae8cbfd
2018-08-01 14:28:51 -07:00
59af5b928a Move UniqueVoidPtr to ATen/core and apply lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10131

Reviewed By: smessmer

Differential Revision: D9121096

fbshipit-source-id: a6861429f06302e3e279ff669961bba34a9fb7a1
2018-08-01 13:25:23 -07:00
2d6738e89e Fix lint in ATen/core (but not ArrayRef)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10124

Reviewed By: smessmer

Differential Revision: D9119768

fbshipit-source-id: c0a56d27401b730956945146d4f48d4d5a9b77a6
2018-08-01 13:25:19 -07:00
f908b2b919 Use google protobuf in pytorch onnx import/export
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8469

Reviewed By: houseroad

Differential Revision: D9102041

Pulled By: li-roy

fbshipit-source-id: 805c473745d181b71c7deebf0b9afd0f0849ba4f
2018-08-01 12:54:41 -07:00
5a44be50ab Minor nit in comment in CMakeLists.txt
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10125

Reviewed By: smessmer

Differential Revision: D9119766

fbshipit-source-id: 290b804bc552b1c3f68e5129ff60ef7f34307714
2018-08-01 12:39:38 -07:00
e8f27311aa fix a couple problems with libtorch cmake file (#10091)
Summary:
in particular, make not building tests actually work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10091

Differential Revision: D9121366

Pulled By: anderspapitto

fbshipit-source-id: d7d38cf759aa46bff90d3b4f695c20f29039ae75
2018-08-01 11:39:33 -07:00
f126687fbc Add a dump() method to IR Node's. (#10106)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10106

Differential Revision: D9119891

Pulled By: resistor

fbshipit-source-id: 5f41d8890007c639f8f0cdc92d11b128433ad6b8
2018-08-01 11:09:53 -07:00
4070005081 Move C++17.h to ATen/core (#10107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10107

This header is needed for ATen/core stuff

This diff also fixes an issue in C++17.h when run in C++17 enabled compilers.

Reviewed By: ezyang

Differential Revision: D9095209

fbshipit-source-id: d45947956019a7095875f48746b88c414e8865bc
2018-08-01 09:54:59 -07:00
87d57dc5f5 Simplified Operator (#10080)
Summary:
zdevito explained that the attributed versions of `Operator`s are no longer necessary. This PR does two things:

1. Removes all code associated with attributed operators,
2. Adds a second kind of state to `Operator` where it is constructed with an `Operation` directly instead of an `OperationCreator`. This will be useful to test custom operators which don't require a node (you can just retrieve it directly).

Now rebased on top of https://github.com/pytorch/pytorch/pull/9801

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10080

Differential Revision: D9113668

Pulled By: goldsborough

fbshipit-source-id: 1276a191c7cf89da1c38488769f2105ce2664750
2018-08-01 09:41:08 -07:00
f1964c43fd Update eigen submodule to fix BUILD_ATEN issue (#10095)
Summary:
Extracted from https://github.com/pytorch/pytorch/pull/8338

Updating Eigen submodule to fix an issue we saw with BUILD_ATEN and BUILD_CAFFE2 removal.

cc mingzhe09088 ezyang smessmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10095

Reviewed By: mingzhe09088

Differential Revision: D9109877

Pulled By: orionr

fbshipit-source-id: 90e36c298d8a22398558d70dc5f68a95a7687d6b
2018-08-01 09:41:06 -07:00
a2a7b0c01a Initial documentation for building libtorch (#10087)
Summary:
It's not a particularly pretty process right now, but it may as well
be documented.  I'm not aware of an ideal location for this, so I'm
just dropping it in the docs/ folder for now as recommended by
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10087

Differential Revision: D9119681

Pulled By: anderspapitto

fbshipit-source-id: cd4afb642f3778c888d66a501bc697d0b0c88388
2018-08-01 09:41:02 -07:00
ee964c51f4 NegativeBinomial distribution (#9345)
Summary:
- [x] implement distribution
- [x] add tests
- [x] docs

cc ingmarschuster
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9345

Differential Revision: D8807023

Pulled By: ezyang

fbshipit-source-id: 7bf7f352dd455e0909c58dd94e1bdebba0e8b5c8
2018-08-01 08:39:25 -07:00
2f848ec8ec Use new PyTorch API to make code simpler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9968

Differential Revision: D9088316

Pulled By: li-roy

fbshipit-source-id: 2658fe0c1734d8b064cbad24d8f0d6c341400b4e
2018-08-01 08:39:23 -07:00
fa6b28bf40 Move ArrayRef, Backtrace, Error, SmallVector, optional to ATen/core; add CoreAPI (#10092)
Summary:
This also makes Backtrace more portable, by disabling its functionality for
mobile builds as well.

It also handles Caffe2 static Windows builds by introducing a new variable,
AT_CORE_STATIC_WINDOWS, which must be set if you're building
ATen on Windows as part of a static library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10092

Reviewed By: gchanan, smessmer

Differential Revision: D9094393

Pulled By: ezyang

fbshipit-source-id: 93281f9302bd378605a26589ae308faf1dac7df4
2018-08-01 08:39:22 -07:00
b503109f20 Guard sizes/strides in THCUNN for scalars.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10083

Differential Revision: D9093572

Pulled By: gchanan

fbshipit-source-id: a5c27571ec06f8ed30e6b3b492c743444b58d9fe
2018-08-01 08:10:33 -07:00
43b151224e Move grid sampler to ATen (#9961)
Summary:
Spatial version benchmark

|                           | CPUFloat THNN | CPUFloat ATen | CPUDouble THNN | CPUDouble ATen | CUDAHalf THNN | CUDAHalf ATen | CUDAFloat THNN | CUDAFloat ATen | CUDADouble THNN | CUDADouble ATen |
|---------------------------|---------------|---------------|----------------|----------------|---------------|---------------|----------------|----------------|-----------------|-----------------|
| [1024x1x28x28] zero pad   | 2.19281888s   | 0.21280479s   | 2.52922535s    | 0.23944831s    | 0.17494774s   | 0.06242800s   | 0.31270599s    | 0.03706479s    | 0.40542483s     | 0.07391024s     |
| [1024x1x28x28] border pad | 3.04329610s   | 0.24705672s   | 2.29205394s    | 0.22336411s    | 0.17980361s   | 0.06212497s   | 0.31415701s    | 0.03847790s    | 0.43020391s     | 0.07540464s     |
| [32x3x244x244] zero pad   | 18.29301333s  | 2.18566656s   | 19.01662397s   | 3.51552224s    | 1.72487235s   | 0.28933954s   | 2.02466702s    | 0.18178749s    | 2.63671613s     | 0.41391206s     |
| [32x3x244x244] border pad | 18.72205329s  | 2.02600884s   | 20.13017297s   | 3.25979590s    | 1.96455693s   | 0.33070564s   | 2.18666625s    | 0.19546938s    | 2.91268897s     | 0.38465047s     |

For #9702

basics:
+ grid tensors have dimensions `[N, H, W, 2]` (or `[N, D, H, W, 3]` for 3d).
+ input/output tensors have dimensions `[N, C, H, W]` (or `[N, C, D, H ,W]` for 3d)
+ grid sampler maps `input([N, C, inp_H, inp_W]), grid([N, H, W, 2])` to `output([N, C, H, W])` (3d case is similar).

variable naming:
+ `tensor_sH` means the stride of `tensor` at the dimension of `H`.
+ `tensor_ptr_NCH` is a data pointer that always points to the beginning of the `tensor[n][c][h]` slice in the loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9961

Differential Revision: D9057175

Pulled By: SsnL

fbshipit-source-id: 9ed8f1dc376ed10229f047fdcf3c90dbd250bee6
2018-08-01 07:54:46 -07:00
6fc75eadf0 Add CELU activation to pytorch (#8551)
Summary:
Also fuse input scale multiplication into ELU

Paper:
https://arxiv.org/pdf/1704.07483.pdf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551

Differential Revision: D9088477

Pulled By: SsnL

fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b
2018-08-01 07:54:44 -07:00
6f6a1f2d63 fix test_load_error_msg failure (Network is unreachable) (#10021)
Summary:
- fixes [some failure]
- removed use of urlopen in test_load_error_msg]

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10021

Differential Revision: D9068108

Pulled By: weiyangfb

fbshipit-source-id: a9484d4a913508d54731b6a1eef3cddff66604f2
2018-08-01 00:24:01 -07:00
5bd43a7af8 Refactor Seq2SeqModelCaffe2EnsembleDecoder (#10035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10035

This is an initial diff which refactors some of the components in the Seq2SeqModelCaffe2EnsembleDecoder class.

Reviewed By: jmp84

Differential Revision: D9026372

fbshipit-source-id: 449635208f24494209ae2fb78a19fca872970ea8
2018-07-31 23:09:09 -07:00
3d247041e4 Force sync device when ops are sampled for observation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10054

Reviewed By: xw285cornell

Differential Revision: D9071097

fbshipit-source-id: 44357cdf79148e81db86c5350122a1a320a923fb
2018-07-31 21:09:00 -07:00
ec807f2a91 Bail out if netdef has disable_nomnigraph argument
Summary: allow models to override nomnigraph opts

Reviewed By: ajtulloch

Differential Revision: D9035729

fbshipit-source-id: 2b30208263c14ce7039f27c618a3b232bf11ee33
2018-07-31 20:54:46 -07:00
fcd567ed15 Enable Optimization on mobile by default
Summary: Re-enable opt by default

Reviewed By: Maratyszcza

Differential Revision: D8525434

fbshipit-source-id: a61253907251a44cfc59e0b50fb1906c5eb20558
2018-07-31 20:54:44 -07:00
7d2bda7588 Move DDP broadcast coalesced to C++ (#9729)
Summary:
This PR depends on the tests added in #9670. It moves the first, tiny function from the c10d DDP to C++: `dist_broadcast_coalesced`. Let me know if ` torch/csrc/distributed/c10d/ddp.h` will be a good place to put these rewritten functions.

pietern The controller you requested could not be found. apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9729

Differential Revision: D8985308

Pulled By: goldsborough

fbshipit-source-id: dc459fe9040273714044152063585e746974752f
2018-07-31 19:54:21 -07:00
294c065384 Changed serialization mechanism of LambdaLR scheduler (#9927)
Summary:
I opened an issue explaining some of my frustrations with the current state of schedulers.
While most points that I raised in [that issue](https://github.com/pytorch/pytorch/issues/8741#issuecomment-404449697) need to be discussed more thoroughly before being implemented, there are some that are not so difficult to fix.

This PR changes the way the LambdaLR scheduler gets serialized:
> The lr_lambda functions are only saved if the are callable objects (which can be stateful).
> There is no point in saving functions/lambdas as you need their definition before unpickling and they are stateless.

This has the big advantage that the scheduler is serializable, even if you use lambda functions or locally defined functions (aka a function in a function).

Does this functionality need any unit tests?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9927

Differential Revision: D9055505

Pulled By: soumith

fbshipit-source-id: 6c1cec588beedd098ec7d2bce6a9add27f29e48f
2018-07-31 19:39:06 -07:00
aae37324cc fixed a newly introduced regression in softmax (#10066)
Summary:
There is a regression in softmin in 0.4.1 that was not present in 0.4.0.  The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x).  These are not the same.  The fix is trivial because the bug is due to operator precedence.

This is a major regression that broke my training.  I'm not sure how a unit test did not catch this.

```
x = torch.tensor([1, 2, 3.5, 4])
print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0
print(F.softmax(-x, dim=0)) # this is what softmax should be
print(F.softmax(x, dim=0))
print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly
```
In 0.4.1 this produces
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
tensor([0.6668, 0.2453, 0.0547, 0.0332])
tensor([0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])

In 0.4.0 this produces the correct values
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.0278,  0.0755,  0.3385,  0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066

Differential Revision: D9106995

Pulled By: soumith

fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b
2018-07-31 19:28:30 -07:00
f2412fbafc Allow multiple ops.def and clean up code gen in general
Summary:
This is a cleanup and refactoring.
In its original form (changeset 6fdf915c057a) this diff caused a 5% regression
on ads CPU.  The root cause was an omission of link_whole = True, causing
symbols to be stripped in mode/opt and forcing the converter to fallback
causing patterns to be unmatched in the graph transform logic.  This version of
the diff tests for link_whole by including a C++ test of the transform

Reviewed By: yinghai

Differential Revision: D9040511

fbshipit-source-id: 3e19b89989aa68b021762d12af2d0b4111280b22
2018-07-31 19:28:28 -07:00
799c947cf3 add .gitattributes for EOL conversion. (#9813)
Summary:
`.bat` file's EOL is LF, so a build is failed on some Windows machines.
To fix this, add `.gitattributes` and set batch file's EOL to CRLF.

Discussion is in #9677.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9813

Differential Revision: D9026486

Pulled By: soumith

fbshipit-source-id: 341eaa677c35f8476a7eda1bac9827385072eb29
2018-07-31 18:38:43 -07:00
9c0f65fc87 Remove While op stuff (#10102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10102

these codepaths are unused, deleting them

Reviewed By: yinghai

Differential Revision: D9109764

fbshipit-source-id: 8ace42a399806632bfbcada96b383268f0a8ae89
2018-07-31 17:56:25 -07:00
c54d71ba60 Upgrade old transform passes to newer APIs (#10046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10046

stampable

Reviewed By: duc0

Differential Revision: D9075830

fbshipit-source-id: dc65be1d39625ef24ad319b5ce0263ecfe7a10c9
2018-07-31 17:39:35 -07:00
ceb0f14176 Fix SpatialBN Fusion (#10044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10044

The test was subtly broken! This transform wasn't writing to the correct blob and the test did not catch that because it was looking at the old version.

thanks @[100022211048576:kerenzhou] for catching this

Reviewed By: Jokeren

Differential Revision: D9075520

fbshipit-source-id: c31ff0afcd78dd2dc7ffc240e2e89eeda87f1fb4
2018-07-31 17:39:34 -07:00
bf744bea94 Parse and register schema declarations lazily (#9801)
Summary:
This should prevent slow startup times, and will not report as many
errors during static initialization time which are hard to debug

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9801

Reviewed By: goldsborough

Differential Revision: D8986603

Pulled By: zdevito

fbshipit-source-id: 440d43ab5e8cffe0b15118cb5fda36391ed06dbc
2018-07-31 17:24:24 -07:00
34c7c56c73 Re-enable empty n-dimensional empty tensor and fix parallel CPU on empty tensors (#10077)
Summary:
This is a combination of https://github.com/pytorch/pytorch/pull/9947 (this was reverted) and https://github.com/pytorch/pytorch/pull/10076.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10077

Differential Revision: D9087491

Pulled By: gchanan

fbshipit-source-id: 9fe9905628000f2ff3e47df32533cd7d1f25a354
2018-07-31 16:43:45 -07:00
ba5d33bede Re-Enable ATen in C2 in integration builds to test ONNX ATen conversions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10060

Differential Revision: D9081387

Pulled By: bddppq

fbshipit-source-id: 13cbff63df5241e013d4ebacfcd6da082e7196f6
2018-07-31 15:27:05 -07:00
e04f8bbfa6 Add virtual dtor for ideep context (#10059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10059

Without virtual dtor, it could induce incorrect sized deallocation, messing up the memory. And unfortunately, sized deallocation cannot be detected by ASAN, yet.

Reviewed By: jerryzh168

Differential Revision: D9080526

fbshipit-source-id: c136cf653134e75b074326be2bc03627da42446f
2018-07-31 15:27:02 -07:00
d2178562a4 Remove some unnecessary includes. (#10085)
Summary:
The affected files are all files that are planned to be moved
to ATen/core; the includes are for headers which are NOT slated
for movement.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10085

Differential Revision: D9093746

Pulled By: ezyang

fbshipit-source-id: 2beeffdae26d03d631d2d51b40bf6303759a2f50
2018-07-31 15:13:37 -07:00
1f13453b4d Slightly relax the constraints on argument and return types to script functions (#9969)
Summary:
This lays out initial support for taking and returning a richer set
of types than only tensors. Floats and ints are already valid, lists are
straightforward to add, tuples need some discussion.

Based on top of #9948. Review only the last commit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9969

Reviewed By: zdevito

Differential Revision: D9076973

Pulled By: apaszke

fbshipit-source-id: 5a1fe912ea6b79ab2bfd0dcce265eb05855b5ff0
2018-07-31 14:25:29 -07:00
58fd6e1dd6 Also add ATen/core tests to oss CI (#10029)
Summary:
-
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10029

Reviewed By: ezyang

Differential Revision: D9070030

Pulled By: smessmer

fbshipit-source-id: b5ae79a383dc14e7d79e6a82c5d70e951c9f5168
2018-07-31 13:54:39 -07:00
ee17ed672b Add missing dependencies (#10086)
Summary:
Fix the master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10086

Differential Revision: D9093741

Pulled By: houseroad

fbshipit-source-id: 65e42994ae7d8e0b449d10a8116a7609434aad04
2018-07-31 13:54:38 -07:00
2422801625 fix _pointwise_loss for target gradients (#10018)
Summary:
_pointwise loss has some python special casing, we converted reduction to aten enums too early.

fixes #10009
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018

Differential Revision: D9075489

Pulled By: li-roy

fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162
2018-07-31 13:39:58 -07:00
56d1a82b31 Add shape inference when converting from onnx to caffe2 (#10037)
Summary:
Otherwise, some RNN case conversion may fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10037

Reviewed By: orionr

Differential Revision: D9072298

Pulled By: houseroad

fbshipit-source-id: 080f589eba8618719453feb15a7a494fe5380dd0
2018-07-31 12:42:02 -07:00
371a786b18 Errors out when Openmpi < 2.x.x with distributed. (#10015)
Summary:
This PR fixes #9418 .
Openmpi 1.10 segfaults in MPI_Bcast with CUDA buffer. And it's a retired openmpi version.
I've tested on 2.1.1 and 3.0.0 and they work well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10015

Reviewed By: soumith

Differential Revision: D9088103

Pulled By: ailzhang

fbshipit-source-id: fc0a45e5cd016093ef0dbb9f371cbf67170d7045
2018-07-31 12:24:40 -07:00
1ae520c704 Add AT_CHECK for null storage. (#9823)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9823

Differential Revision: D9029433

Pulled By: ezyang

fbshipit-source-id: 6101556305593c66f618b20d8c2a084ae2558ea8
2018-07-31 12:09:25 -07:00
685224aa14 Add CTC loss (#9628)
Summary:
The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the
modification that is is in log space.
The there also is a binding for the (much faster) CuDNN implementation.

This could eventually fix #3420

I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments.

I could use feedback on all sorts of things, including:
- Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors)
- Input convention. I use log probs because that is what the gradients are for.
- Launch parameters for the kernels
- Errors and obmissions and anything else I'm not even aware of.

Thank you for looking!

In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this).
I have read CuDNN is much faster than implementations because it does *not* use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step.
Average timings for the kernels from nvprof for some size:

```
CuDNN:
60.464us compute_alphas_and_betas
16.755us compute_grads_deterministic
Cuda:
121.06us ctc_loss_backward_collect_gpu_kernel (= grads)
109.88us ctc_loss_gpu_kernel (= alphas)
98.517us ctc_loss_backward_betas_gpu_kernel (= betas)
WarpCTC:
299.74us compute_betas_and_grad_kernel
66.977us compute_alpha_kernel
```

Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations.

Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably.

My performance measuring testing script:
```
import timeit
import sys
import torch
num_labels = 10
target_length  = 30
input_length = 50
eps = 1e-5
BLANK = 0#num_labels
batch_size = 16

torch.manual_seed(5)
activations = torch.randn(input_length, batch_size, num_labels + 1)
log_probs = torch.log_softmax(activations, 2)
probs = torch.exp(log_probs)
targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long)
targets_2d = targets.view(batch_size, target_length)
target_lengths = torch.tensor(batch_size*[target_length])
input_lengths = torch.tensor(batch_size*[input_length])
activations = log_probs.detach()

def time_cuda_ctc_loss(grout, *args):
    torch.cuda.synchronize()
    culo, culog_alpha = torch._ctc_loss(*args)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

def time_cudnn_ctc_loss(groupt, *args):
    torch.cuda.synchronize()
    culo, cugra= torch._cudnn_ctc_loss(*args)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

def time_warp_ctc_loss(grout, *args):
    torch.cuda.synchronize()
    culo = warpctc.ctc_loss(*args, blank_label=BLANK, size_average=False, length_average=False, reduce=False)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

if sys.argv[1] == 'cuda':
    lpcu = log_probs.float().cuda().detach().requires_grad_()
    args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK]
    grout = lpcu.new_ones((batch_size,))
    torch.cuda.synchronize()
    print(timeit.repeat("time_cuda_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'cudnn':
    lpcu = log_probs.float().cuda().detach().requires_grad_()
    args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True]
    grout = lpcu.new_ones((batch_size,))
    torch.cuda.synchronize()
    print(timeit.repeat("time_cudnn_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'warpctc':
    import warpctc
    activations = activations.cuda().detach().requires_grad_()
    args = [activations, input_lengths.int(), targets.int(), target_lengths.int()]
    grout = activations.new_ones((batch_size,), device='cpu')
    torch.cuda.synchronize()

    print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals()))
```
I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628

Differential Revision: D8952453

Pulled By: ezyang

fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860
2018-07-31 11:09:48 -07:00
430e44480f Delete some obsolete steps in the ROCm build. (#10005)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10005

Differential Revision: D9066107

Pulled By: ezyang

fbshipit-source-id: 346f654214cff1c956a4022173347d95657ee9d4
2018-07-31 11:09:46 -07:00
f779202711 Correctly set CAFFE2_DISABLE_NUMA when USE_NUMA=OFF in cmake (#10061)
Summary:
previously https://github.com/pytorch/pytorch/blob/master/caffe2/core/numa.cc still gets compiled even when USE_NUMA=OFF
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10061

Reviewed By: houseroad

Differential Revision: D9081385

Pulled By: bddppq

fbshipit-source-id: ad28b647e0033727839770b1da0fba341b1b7787
2018-07-31 11:01:51 -07:00
cba03e2ebe Handle dynamic repeats in onnx symbolic (#10052)
Summary:
ONNX Tile can takes the `repeats` as dynamic input
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10052

Differential Revision: D9076841

Pulled By: bddppq

fbshipit-source-id: ddd692c5f5846c8fdba019baa9fad83ef9638da4
2018-07-31 10:39:50 -07:00
0c11101eca Prepare THNN/THCUNN for first class scalars. (#10023)
Summary:
I previous did some transformations, e.g. _nDimension,_dim -> nDimensionLegacyAll, nDimension -> nDimensionLegacyNoScalars.
But this didn't touch dim(), which needs to be updated to support scalars.  Instead of doing an (ugly) move, I audited the call sites and updated the cases that could be size 1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10023

Differential Revision: D9068996

Pulled By: gchanan

fbshipit-source-id: c63820767dd1496e908a5a96c34968482193f2c5
2018-07-31 10:39:48 -07:00
c2d9d2888b Fix typo in tensors.rst (#10073)
Summary:
An tensor -> A tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10073

Differential Revision: D9087421

Pulled By: soumith

fbshipit-source-id: 6713f5a5e11fb11dff0ab5d2d6274f7837c6625f
2018-07-31 10:13:40 -07:00
68cbe37c6a fix the reference link path
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9240

Reviewed By: SsnL

Differential Revision: D8764196

Pulled By: ezyang

fbshipit-source-id: 3efc70714406d801ed74f52313beca61129593c7
2018-07-31 09:09:46 -07:00
5e5c15dd42 Add (constant size) TensorLists to JIT, use them in cat and stack nodes (#9948)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9948

Reviewed By: ezyang

Differential Revision: D9033666

Pulled By: apaszke

fbshipit-source-id: 02d75e391ed6dee62500842df50f0b6ee5e38846
2018-07-31 07:39:52 -07:00
6fb9acfc16 Revert empty n-dim and ATen in C2 integration builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10064

Differential Revision: D9082082

Pulled By: gchanan

fbshipit-source-id: ae49470f5b4c89b13beb55fd825de1ba05b6a4fa
2018-07-31 07:25:56 -07:00
78b806c861 Fix the onnx symbolic for upsample (#10001)
Summary:
We missed the upsample symbolic when bumping up the opset to 7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10001

Reviewed By: bddppq

Differential Revision: D9067212

Pulled By: houseroad

fbshipit-source-id: 3e285d2800a32cb04fa82f8e7f261bdd010a8883
2018-07-30 21:39:48 -07:00
37a226de63 When BUILD_ATEN=OFF, use ATen/core directly (#10019)
Summary:
ATenCore.h is a dummy header to just test that this is working at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019

Reviewed By: smessmer

Differential Revision: D9067262

Pulled By: ezyang

fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee
2018-07-30 21:09:55 -07:00
aa36a5d01c Add typing into caffe2 requirements.txt for USE_ATEN (#10047)
Summary:
I was dumb lol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10047

Differential Revision: D9076023

Pulled By: bddppq

fbshipit-source-id: 10587875d04ac2aed2e015846fc73ce9e4717a4f
2018-07-30 20:09:21 -07:00
51539fa383 Add pyyaml into caffe2 requirements.txt for USE_ATEN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10039

Reviewed By: houseroad

Differential Revision: D9074261

Pulled By: bddppq

fbshipit-source-id: 26df516633d5a4ec539a03a62cf9e7839e1e1964
2018-07-30 18:11:25 -07:00
8f0a229078 Fix HPTT path for 0-sized inputs.
Reviewed By: Maratyszcza

Differential Revision: D9068091

fbshipit-source-id: 4aeac45f9732a86979a08488637bf0ba6cc79b34
2018-07-30 17:54:57 -07:00
788b2e996d nomnigraph - minor cleanup of Graph.h (#9890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9890

Minor cleanups for Graph.h to make it more consistent with our style guide

Also fix opt/device.cc and binary_match_test.cc to not access subgraph.nodes_ which is now private

Reviewed By: bwasti

Differential Revision: D9017108

fbshipit-source-id: 9f5cba4a2cd2a452a955005f4704f6c120bbc1d5
2018-07-30 16:24:03 -07:00
e0a0234018 Remove C++14 feature (#10022)
Summary:
Which test should I look at, bddppq?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10022

Reviewed By: bddppq

Differential Revision: D9068732

Pulled By: yinghai

fbshipit-source-id: 241ef72c7fac0ed0b8c58ecdffbb5e24eb956217
2018-07-30 16:24:02 -07:00
3e3f40aeeb Update onnx to latest master (#10024)
Summary:
df01dbc005
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10024

Reviewed By: houseroad

Differential Revision: D9069464

Pulled By: bddppq

fbshipit-source-id: 751328352cd495e27b6bd533f4632d3d6d06c4a6
2018-07-30 15:54:34 -07:00
e57cb4a1b2 Add a Constant Propagation Pass to the JIT (#8808)
Summary:
Adding a constant propagation pass to the JIT. I have added examples to the expect files.

There are a couple of special cases which have not been implemented here. IF nodes with constant conditions can be inlined with the correct block. WHILE nodes can be removed if the condition is false.  I have added a test for each case in test_jit.py file as expected failures.

To be consistent with DCE, python ops & CPP ops are treated as not having side-effects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8808

Reviewed By: wanchaol

Differential Revision: D8906770

Pulled By: eellison

fbshipit-source-id: 10ad796d89f80b843566c9ddad6a0abd1f3dc74c
2018-07-30 15:54:31 -07:00
db96a0951f Add SIMD version to GFTRL optimizer (#9698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698

Add SIMD version to GFTRL optimizer

Differential Revision: D8949723

fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26
2018-07-30 15:27:24 -07:00
9987282134 Use Retainable as base class for StorageImpl
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9956

Reviewed By: gchanan

Differential Revision: D9066103

Pulled By: cpuhrsch

fbshipit-source-id: 1a5a2ace306308707add3d0e0c1fc861f5c79705
2018-07-30 15:08:56 -07:00
7214754663 Check and return when numel() == 0 in Loops.cuh.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10031

Reviewed By: colesbury

Differential Revision: D9070346

Pulled By: gchanan

fbshipit-source-id: d6ad4e6ca43d334f5be42fea35915270dd8f405e
2018-07-30 15:01:28 -07:00
57750bd638 Enable ATen in C2 in integration builds to test ONNX ATen conversions (#10014)
Summary:
zrphercule
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10014

Reviewed By: houseroad

Differential Revision: D9061842

Pulled By: bddppq

fbshipit-source-id: 1e1c2aeae62dd2cc5c6a8d5e1d395ea5cf882734
2018-07-30 15:01:13 -07:00
6c7fb1582f Introduce __array_priority__ on torch.Tensor (#9651)
Summary:
This causes numpy to yield to the torch functions,
e.g. instead of numpy array/scalar __mul__ converting the tensor to
an array, it will now arrange for the Tensor __rmul__ to be called.

Fixes case 2 of #9468
I also makes case 3 and 4 equivalent but does not fix them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9651

Differential Revision: D8948079

Pulled By: ezyang

fbshipit-source-id: bd42c04e96783da0bd340f37f4ac3559e9bbf8db
2018-07-30 14:39:43 -07:00
ea3c36b822 NumPy Scalar to PyTorch Scalar (#9225)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4985 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9225

Differential Revision: D8769317

Pulled By: ezyang

fbshipit-source-id: eeaeaf0749c9dc9e372634da68b4bd23e6e3ad28
2018-07-30 14:39:40 -07:00
c9eab34e63 Fix Caffe2 with ATen conda build failure (#10020)
Summary:
Extracted from 627624627e and in support of https://github.com/pytorch/pytorch/pull/10019

cc pjh5 mingzhe09088 ezyang smessmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10020

Reviewed By: pjh5

Differential Revision: D9068124

Pulled By: orionr

fbshipit-source-id: 4dd4910136a312b6517c65ce8802837108475f89
2018-07-30 14:10:02 -07:00
04939a4745 Match parameter names and = default (#9737)
Summary:
More clang tidy cleanups in `torch/csrc`. This time:

1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration)
2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs.

Also updated my script a little bit.

apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737

Differential Revision: D9069069

Pulled By: goldsborough

fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae
2018-07-30 14:10:00 -07:00
40a8239984 Fix a bug in argument spec (#9958)
Summary:
Non-tensor types did not set the running total_dims count, causing corrupted data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9958

Reviewed By: jamesr66a

Differential Revision: D9065621

Pulled By: zdevito

fbshipit-source-id: 0ac1fcdf6da076a9c9ebd5d70ce9126e3f8e722e
2018-07-30 13:08:59 -07:00
faa96c1c47 Deal with spaces in einsum equation string (#9994)
Summary:
Fixes #9930
Thank you, vadimkantorov for the report.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9994

Differential Revision: D9042876

Pulled By: ezyang

fbshipit-source-id: 3bbd1aaaf1b432be40a7652b6a746d80934a216b
2018-07-30 12:57:56 -07:00
ce5f0d40b6 Enable n-dimensional empty tensors. (#9947)
Summary:
These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947

Reviewed By: ezyang

Differential Revision: D9032778

Pulled By: gchanan

fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd
2018-07-30 12:33:17 -07:00
73a60efccc Fix Caffe2CTScan error (#9962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9962

att

Reviewed By: hlu1

Differential Revision: D9036869

fbshipit-source-id: 3155af00c62d489f998cbfba07121c4fd20e1c6f
2018-07-30 12:33:15 -07:00
b4f8c60931 Don't use the XML reporter for Catch2. (#10012)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10012

Differential Revision: D9057766

Pulled By: ezyang

fbshipit-source-id: 12148a8cf3061423c61b3e7b36864dfcdb1138a1
2018-07-30 11:25:09 -07:00
9a9a7325c6 Remove the generation of storage files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9954

Reviewed By: gchanan

Differential Revision: D9035947

Pulled By: cpuhrsch

fbshipit-source-id: 9b56c7a68e3f562ea11b9265a5fa234838f2b4e0
2018-07-30 09:53:57 -07:00
432ca747b0 Don't seed GPUs if there are none available. (#9931)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9931

Differential Revision: D9051375

Pulled By: ezyang

fbshipit-source-id: 1721f6217e07f80adc107d95e897cd7dd488659a
2018-07-30 08:23:53 -07:00
3609977d7f Update onnx to onnx/onnx@c761845 (#9964)
Summary:
c761845c7f
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9964

Reviewed By: houseroad

Differential Revision: D9038133

Pulled By: bddppq

fbshipit-source-id: 6ce740944e636175d2de4602edb92cc4d7e8e5ac
2018-07-29 23:10:12 -07:00
5ff1551eb9 ATen's emscripten support (#9803)
Summary:
Not sure if anybody is interested but I managed to infer a `GRU` fine in `wasm` using ATen's compiled with emscripten. It was quite trivial to fix the configuration.
It also passes most of the tests, specially all scalar tensor tests.

The command line to configure was, but could be simplified:
```
emconfigure cmake -DAT_LINK_STYLE=STATIC -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DCMAKE_C_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_CXX_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_INSTALL_PREFIX=/home/sugar/aten-wasm ../
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9803

Differential Revision: D9004610

Pulled By: ezyang

fbshipit-source-id: db26c59f27162ed80f6aee2973c4cb9252d3d1e4
2018-07-29 20:39:00 -07:00
3d6015db0e Add essential PATH for the Windows PyTorch loading process (#9920)
Summary:
Fixes #9818.
It seems original Python doesn't add `[PYTHONPATH]\Library\bin` into `PATH`. We try to add it before dll loading process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9920

Differential Revision: D9040825

Pulled By: soumith

fbshipit-source-id: c07fff71b2aea254a396042ab677696f6829aac7
2018-07-29 08:23:59 -07:00
56974a06b5 Revert D8909766: [caffe2] Simplify order switch operators
Differential Revision:
D8909766

Original commit changeset: 17a302d5bf4a

fbshipit-source-id: 56c75a8ce27873ed1d5f194b9d6bf0049d8f21ba
2018-07-28 18:40:13 -07:00
eee01731a5 Adds the default value for the amsgrad arg to the Adam docstring (#9971)
Summary:
Minor addition to the docstring of `torch.nn.optim.Adam`, adding the default argument description for the `amsgrad` argument to the docstring for concistency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9971

Differential Revision: D9040820

Pulled By: soumith

fbshipit-source-id: 168744a6bb0d1422331beffd7e694b9d6f61900c
2018-07-28 09:23:45 -07:00
b99492a507 Fix BlobStatRegistry HIP BlobStatGetter registration issue (#9973)
Summary:
This was introduced in #9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7df. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when #9826 later landed to master the rocm tests start breaking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9973

Differential Revision: D9040671

Pulled By: bddppq

fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922
2018-07-28 02:23:40 -07:00
46d8002800 Fix bug that always uses the same blob when repeating poolings
Reviewed By: houseroad

Differential Revision: D9027902

fbshipit-source-id: 957702ad9736812ec5aa32066d286c2c3adffc49
2018-07-28 00:09:16 -07:00
47c1badf90 Fix the clamp special case and gradient problem on None, add None to JIT (#9596)
Summary:
Supersedes #8925

This PR fixes #8502, it fixes the gradients problem for clamp when passing None to the function, and add support for the NoneLiteral and NoneType in script to enable clamp tests. Now we could have corner cases like:

```python
torch.jit.script
def func():
    x = torch.randn(3, 3, requires_grad=True)
    y = torch.clamp(x, None, 0) # max = 0
    y = torch.clamp(x, min=None, max=0)
```

In both JIT and Aten, we use Scalar(NAN) as a sentinel value when passing None type to function clamp, this is the current way we used to support None type in JIT and to solve the gradient problem when user explicitly passing None into clamp.

In JIT side, we create a tensor(NAN) and undefinedTensor if we encounter None when matching the function schema, and later in the interpreter, it will translate to Scalar(NAN) if needed.

Ideally we don't need clamp_min and clamp_max in ATenNative/Autograd and could only support clamp after this change, but since bunch of other operators (e.g. Activation.cpp, Loss.cpp) is using clamp_min in several places, we will still have the functions available, but all python invocations will only call clamp instead of clamp_min/max (with calling underlying th_max/th_min in clamp).

zdevito jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9596

Reviewed By: zdevito

Differential Revision: D8940839

Pulled By: wanchaol

fbshipit-source-id: c543a867b82e0ab8c99384773b173fdde2605d28
2018-07-27 22:54:33 -07:00
851c18dd20 PyTorch File Format API (#9900)
Summary:
This is a follow-up to https://github.com/pytorch/pytorch/pull/9794 that contains only the serialization library and exposes a cleaner API. This should later be incorporated into the module export code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9900

Reviewed By: zdevito

Differential Revision: D9021057

Pulled By: jamesr66a

fbshipit-source-id: 01af74a7fdd1b90b2f5484644c3121d8ba9eb3b3
2018-07-27 22:24:57 -07:00
d913db70f2 Handle the "spatial" attribute in onnx BatchNormalization op (#9492)
Summary:
If we have this "spatial" attribute and its value equals to 1, we could just remove this attribute and convert this op to caffe2 SpatialBN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9492

Differential Revision: D8988165

Pulled By: houseroad

fbshipit-source-id: a9218dc9cd5fab43deb371f290f81285f5283231
2018-07-27 22:09:15 -07:00
bcba5a50d1 Fix EnforceFiniteOp
Summary: att

Reviewed By: kennyhorror

Differential Revision: D9040248

fbshipit-source-id: 0da0f3b1ce51375731098cc86c92f35953be0861
2018-07-27 22:01:23 -07:00
ab4e209007 Back out "[caffe2][nomnigraph] Allow multiple ops.def and clean up code gen in general"
Summary: Original commit changeset: 6fdf915c057a

Reviewed By: yinghai

Differential Revision: D9040008

fbshipit-source-id: 33fd5d4ddc0ec8cae56cf86f6d63b6f666e51a3e
2018-07-27 20:09:14 -07:00
607688e928 Adding reciprocal operator and a test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908

Differential Revision: D9035809

Pulled By: virtan

fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7
2018-07-27 18:24:43 -07:00
ee827f6ba3 Fix a testcase in logsoftmax onnx export (#9660)
Summary:
We only support special case. The original dim is not supported by ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9660

Reviewed By: bddppq

Differential Revision: D8965507

Pulled By: houseroad

fbshipit-source-id: 021dffdf0489c2d3a50bfd1e0c4cfd00d4a3d776
2018-07-27 17:54:32 -07:00
12a1af3731 Adding conv tests with explicit algo definition
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9798

Differential Revision: D9034663

Pulled By: virtan

fbshipit-source-id: d722f25f1dd00231ccc3ad5960bbbef63af02c2d
2018-07-27 17:39:17 -07:00
9eeb4e17af Split gather op for easier smaller code size (#9916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9916

att

Differential Revision: D8961085

fbshipit-source-id: 39a9838647dc97611e77beb0607c4655de727ada
2018-07-27 17:15:33 -07:00
c3fe071483 Update hip files (#9826)
Summary:
The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826

Differential Revision: D9032840

Pulled By: bddppq

fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f
2018-07-27 16:54:39 -07:00
a532c1a48c Fix default argument value for CTCGreedyDecoder op (#9747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747

Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases.

Reviewed By: houseroad

Differential Revision: D8963635

fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb
2018-07-27 16:33:07 -07:00
eb9bb1f09a Travis CI: Run flake on Python 2.7 and 3.7 (#9953)
Summary:
Flake8 will produce different results on Python 2 and 3.  Python 3.7 has __async__ as a reserved word https://github.com/pytorch/pytorch/pull/4999.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9953

Differential Revision: D9035415

Pulled By: soumith

fbshipit-source-id: 8a46e028a2e20a7e3f6d90137020268d65a7cc64
2018-07-27 14:43:26 -07:00
829d763c69 Implement add, sub, mul, div using TensorIterator (#8919)
Summary:
```
This adds TensorIterator, a helper class for computing element-wise
operations that's intended to replace the CPU and CUDA apply utils
functions.

CPU kernels are implemented as functions that operate on strided 1-d
tensors compared to CPUApplyUtils which operated individual elements. This
allows the kernels to handle vectorization, while TensorIterator handles
parallelization and non-coalesced dimensions.

GPU kernels continue to operate on elements, but the number of
specializations is reduced. The contiguous case remains the same. The
non-contiguous case uses a single (reduced) shape for all operands and
the fast integer division from THCIntegerDivider. To avoid extra
specializations for indexing with 64-bits, large operations are split
into smaller operations that can be indexed with 32-bits.

Major semantic changes:

 - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by
   TensorIterator. The autograd engine performs the reduction assuming
   standard broadcasting if the gradient shape does not match the
   expected shape. Functions that do not use standard broadcasting rules
   should either continue to trace the expand calls or handle the
   reduction in their derivative formula.

 - Use ONNX v7, which supports broadcasting ops.

Performance impact:

 - Small increased fixed overhead (~0.5 us)
 - Larger overhead for wrapped numbers (~2.5 us)
 - No significant change for ops on contiguous tensors
 - Much faster worst-case performance for non-contiguous GPU tensors
 - Faster CPU bias addition (~2x)
 - Faster GPU bias addition (~30% faster)

Future work:

 - Decrease overhead, especially for wrapping numbers in Tensors
 - Handle general inter-type operations
 - Extend to unary ops and reductions
 - Use buffering for compute-bound operations on non-contiguous tensors
   (pull in from CPUApplyUtils)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919

Differential Revision: D8677600

Pulled By: colesbury

fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd
2018-07-27 14:43:24 -07:00
e3c4057b6c Eliminate an extra lookup in the hashtable during CSE. (#9668)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9668

Differential Revision: D8955185

Pulled By: resistor

fbshipit-source-id: f3f929efc11be63850bd863679cc7b297c98d679
2018-07-27 14:43:22 -07:00
ef9801f32c Merge THStorage into at::Storage
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9772

Reviewed By: ezyang

Differential Revision: D9019375

Pulled By: cpuhrsch

fbshipit-source-id: d5185e29747929d648e4260db4967452cd40f563
2018-07-27 13:53:55 -07:00
6ed41adb04 Use round-to-negative division when computing output sizes for convolutions involving striding and dilation.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9640

Differential Revision: D8948081

Pulled By: resistor

fbshipit-source-id: 06f2e3ad1bdb448be6f36577cb9bd27c884df595
2018-07-27 13:22:54 -07:00
8c0355c90d convert lambd directly to scalar_t at hardshrink (#9919)
Summary:
- convert lambd directly to scalar_t instead of creating a tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9919

Differential Revision: D9026708

Pulled By: weiyangfb

fbshipit-source-id: d20ab06ecc12aa972ee9d1323ee2f84abf8d5ffd
2018-07-27 13:22:52 -07:00
ce0b895a0c Fix UBSAN error in ONNX peephole pass, make it more robust.
Summary: Minor fix for a bug introduced by D9004285

Reviewed By: anderspapitto

Differential Revision: D9028762

fbshipit-source-id: 9b9c5eef30e61d7ae19784e0418fa29bad2b5564
2018-07-27 12:38:56 -07:00
c77e4bc4d5 export tensor(ArrayRef, options) on Windows (#9904)
Summary:
I hope this helps me for the windows build failure in #9628 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9904

Differential Revision: D9026715

Pulled By: soumith

fbshipit-source-id: bb97d41d060823f5a37bfc9a1659815b8b9f4eab
2018-07-27 12:14:52 -07:00
aebf3b47ae Remove template parameter from Tensor (#9939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939

Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: ezyang, houseroad

Differential Revision: D9024330

fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
2018-07-27 10:56:39 -07:00
94439d7df4 Suppress the vptr warning in ubsan (#9909)
Summary:
Unblock https://github.com/pytorch/pytorch/pull/8469
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9909

Differential Revision: D9023650

Pulled By: houseroad

fbshipit-source-id: 7682a9cd7905e98c802b820ad59745672b32970d
2018-07-27 10:28:07 -07:00
c0bacc6284 Guard test_lapack_empty with has_magma. (#9936)
Summary:
CUDA lapack functions generally don't work unless has_magma is true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9936

Differential Revision: D9028579

Pulled By: gchanan

fbshipit-source-id: 9b77e3b05253fd49bcabf604d0924ffa0e116055
2018-07-27 10:09:00 -07:00
bf32ea8094 Fix dimension check in 1D instance norm, allowing 2D tensors alongside 3D. (#9924)
Summary:
Fixes #9776.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9924

Differential Revision: D9028328

Pulled By: soumith

fbshipit-source-id: d5f22abb2be83b34aee95ebe144c97519a6854f8
2018-07-27 09:24:07 -07:00
d3ba9a173e Handle case where THC btrifact doesn't zero info. (#9907)
Summary:
This was showing up in the n-dimensional empty tests as flaky because it's reading uninitialized cuda memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9907

Differential Revision: D9021413

Pulled By: gchanan

fbshipit-source-id: 31542b7597919df9afd6e528bb108a4a3e8eaf60
2018-07-27 09:11:44 -07:00
1af1b0c2a5 Remove THTensor::_dim, temporarily remove THTensor_nDimension. (#9895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9895

The primary goal here was to remove THTensor::_dim, which isn't part of the API moving forward.
Instead, we provide 3 options for getting the dimensionality (this is temporary although non-trivial to remove!):
```
nDimension                 corresponds to the "true" ATen dimension. TODO: implement.
nDimensionLegacyNoScalars  correpsonds to the ATen dimension, except scalars are viewed as 1-dimensional tensors.
nDimensionLegacyAll        corresponds to the ATen dimension, except scalars are viewed as 1-dimensional tensors
                           and tensors with a dimension of size zero are collapsed to 0-dimensional tensors.
```
So in this patch, nDimension -> nDimensionLegacyNoScalars and _dim/_nDimension goes to nDimensionLegacyAll.
These are just codemods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9835

Reviewed By: ezyang

Differential Revision: D8999338

Pulled By: gchanan

fbshipit-source-id: a4d676ac728f6f36ca09604a41e888d545ae9311
2018-07-27 08:56:38 -07:00
bc66d98248 Fix narrow on empty tensors after negative size support.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9838

Differential Revision: D9002345

Pulled By: gchanan

fbshipit-source-id: 13f4bacff94d9d0ea31a3b73a75b9b3e774eabf5
2018-07-27 07:55:20 -07:00
7b375ed362 fix ParameterDict doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9918

Differential Revision: D9026402

Pulled By: soumith

fbshipit-source-id: d0459dcda631e8921ab39725b9045e03960da5c9
2018-07-27 01:10:50 -07:00
a709f23225 revise a little spell mistake in tensor.py (#9868)
Summary:
Hello! I just find a small spell mistake while reading this source code. Just PR it, Thx!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9868

Reviewed By: gchanan, ezyang

Differential Revision: D9016030

Pulled By: soumith

fbshipit-source-id: fc3877177be080adbdbda99a169e401691292ebb
2018-07-27 00:55:03 -07:00
4a192bcc3d Rename onnx integration tests file to avoid confusion
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9913

Differential Revision: D9026787

Pulled By: bddppq

fbshipit-source-id: a3e7e79973abc4f5fe163f3e86b24382a1efd082
2018-07-26 23:40:41 -07:00
8cb1eef7b9 Unify IR operator representation (stop using attributes in the JIT) (#9807)
Summary:
Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..."

I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier.

Known issues/regressions/future tasks:
- `aten::lerp` and `aten::clamp` are no longer fusable
- `CreateAutodiffSubgraphs` needs a rewrite
  - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it *could* infer something).
  - It sometimes creates constant-only (or constants + one node) graphs, which is useless
- Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF
- Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types.
- There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task.
- Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds.
- Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code).

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807

Reviewed By: ezyang

Differential Revision: D9004285

Pulled By: apaszke

fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7
2018-07-26 22:11:50 -07:00
2c1d9e09b8 Support UINT8 for addition data in ImageInputOp (#9901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9901

Added support for UINT8 datatype for additional data (prefetching and
output) by ImageInputOp

Reviewed By: ashwinb

Differential Revision: D9018964

fbshipit-source-id: f938a8a072c15c0ee521b2f16788c024b08cd37f
2018-07-26 22:11:46 -07:00
aa671ddefa Support production models with predictor benchmark (#9855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9855

Support production models with predictor benchmark
Two new flags are added:
`--update_prod`: pull production data (netdef, input types, input dims) from Hive and store locally
`--use_prod`: run benchmark with local production data with the same workload as in production.
By default, 300 models will be loaded.

production vs benchmark
avg net run time:
(collected by prod: https://fburl.com/scuba/6lb91zfx and bench: https://fburl.com/ngjj1dc8)
**prod: `408us` vs bench: `543us`**
(With prod data distribution, this should be even closer)

framework overhead (as of 2018-07-22):
prod:
```
9.111%    BlackBoxPredictor::Run
4.602%    SimpleNet::Run
2.377%    Operator::Run
1.786%    BlackBoxPredictor::AllocateMemory
1.372%    Observable::StartAllObservers
1.358%    Observable::StartObserver
1.206%    Blob::GetMutable
```

bench:
```
8.577%    BlackBoxPredictor::operator()
3.276%    SimpleNet::Run
1.954%    Operator::Run
1.697%    BlackBoxPredictor::AllocateMemory
1.477%    Tensor::ShareData
1.230%    Blob::GetMutable
1.034%    Observable::StartObserver
```

Reviewed By: yinghai

Differential Revision: D8942996

fbshipit-source-id: 27355d7bb5a9fd8d0a40195261d13a97fa24ce17
2018-07-26 21:39:29 -07:00
eb33887816 Addressed issue identified by static code analysis: potential buffer … (#9889)
Summary:
…overrun
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9889

Differential Revision: D9026278

Pulled By: soumith

fbshipit-source-id: ee2ee255f34731ddc581261984c3caf56faa0e12
2018-07-26 21:09:51 -07:00
e41eb43327 Remove deprecated masked_copy (#9819)
Summary:
No tests are affected by this removal.

Closes https://github.com/pytorch/pytorch/issues/1885 and closes #9817

While I was at it, I also fixed #9876 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9819

Differential Revision: D9018126

Pulled By: SsnL

fbshipit-source-id: a9142bf4e2403bef05779a097f61fa8b7db04b71
2018-07-26 20:55:18 -07:00
a841006353 Simplify some code by directly constructing unordered_set from nodes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9675

Differential Revision: D8952196

Pulled By: resistor

fbshipit-source-id: 5ef2308fed9f702021f650cf2d241a83d880d359
2018-07-26 19:54:38 -07:00
dfa0af093d Move predictor into caffe2/caffe2/predictor (#9548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9548

Pull Request resolved: https://github.com/pytorch/translate/pull/157

One part of refactor predictor. Move all the files into predictor dir.

Reviewed By: highker

Differential Revision: D8845276

fbshipit-source-id: 1e917464b0c8a042f025128a082c784eaa3b7013
2018-07-26 19:03:40 -07:00
c045e969b6 Use qualified name at::Half in Dispatch.h (#9848)
Summary:
This makes AT_DISPATCH_ALL_TYPES_AND_HALF valid outside of the at
namespace.

See https://github.com/pytorch/extension-cpp/issues/15
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9848

Differential Revision: D9006921

Pulled By: colesbury

fbshipit-source-id: a6e4f097a9d6fb85c921e1c9b9ea25d0f2db06dc
2018-07-26 19:03:24 -07:00
e7ab093d93 Simplify order switch operators (#9581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581

Mostly to simplify code. Should also improve performance but order switch ops
don't take much time anyway.

Reviewed By: viswanathgs

Differential Revision: D8909766

fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919
2018-07-26 18:24:29 -07:00
b7b61a8eb4 Change expect, cast on Type to return shared pointers, make isSubtypeOf accept TypePtr (#9786)
Summary:
Follow up task of #9584.

Commit 1:

- change expect/cast to return shared pointers instead of raw pointer
- isSubtypeOf accept TypePtr instead. Use `x->isSubtypeOf(NumberType::get())` rather than `x->isSubtypeOf(*NumberType::get())`

Commit 2:

- to address enable_shared_from_this pitfalls, we make the constructor private and expose the factory method to make sure user can only create it using our factory method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9786

Reviewed By: zdevito

Differential Revision: D8980441

Pulled By: wanchaol

fbshipit-source-id: e5c923fc57a701014310e77cf29985b43bb25364
2018-07-26 18:09:45 -07:00
9df9c46992 fix loading 1dim tensor from 0.3.* to 0dim tensor (#9781)
Summary:
This PR fixes #9743 .

Adding backward support when loading a checkpoint from 0.3.* with 1dim tensor, they are now 0 dim tensor in 0.4+.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9781

Differential Revision: D8988196

Pulled By: ailzhang

fbshipit-source-id: a7a1bc771d597394208430575d5a4d23b9653fef
2018-07-26 17:09:41 -07:00
d65c667f28 Avoid divide-by-zero when hamming_window window length is 0.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9896

Reviewed By: ezyang

Differential Revision: D9018572

Pulled By: gchanan

fbshipit-source-id: fa314687973124165bffb3084932d8ab6d872a93
2018-07-26 15:56:44 -07:00
d1260d26fe Sleep before run (#9891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9891

Add an argument to benchmark binary to specify the seconds to sleep before the run and after the warmup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9880

Reviewed By: llyfacebook

Differential Revision: D9014254

Pulled By: sf-wind

fbshipit-source-id: d5566186c8ed768f1e170e9266c5f2d6077391e0
2018-07-26 14:39:17 -07:00
18a6541b82 Create IDEEP fallback operators for ctc decoder ops (#9847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9847

CTCBeamSearchDecoder and CTCGreedyDecoder do not currently support IDEEP
execution. Add fallback operators to allow IDEEP execution of models that use
these operators.

Reviewed By: yinghai

Differential Revision: D9006234

fbshipit-source-id: fc539ba67b07d1f960d28564d8adde0be8690649
2018-07-26 14:09:11 -07:00
969b62f276 Revert D8121878: Remove template parameter from Tensor
Differential Revision:
D8121878

Original commit changeset: 4a5e9a677ba4

fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e
2018-07-26 14:02:04 -07:00
456f41301c Disable unique ops test on rocm (#9892)
Summary:
Somehow we have Unique operator tests in two places test_unqiue_ops.py and hypothesis_test.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9892

Reviewed By: houseroad

Differential Revision: D9017631

Pulled By: bddppq

fbshipit-source-id: 1f9e40e4953afca26141ef4581202b9b9fce0ae9
2018-07-26 13:10:23 -07:00
1dc708493e Add html-stable target to docs Makefile (#9884)
Summary:
This lets one build docs for the release easier. All of the unstable
warnings are removed in `make html-stable`.

cc soumith SsnL

Sample build:
![image](https://user-images.githubusercontent.com/5652049/43277115-05e2f720-90d5-11e8-9977-b0b4a6ee4b8e.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9884

Reviewed By: SsnL

Differential Revision: D9016001

Pulled By: zou3519

fbshipit-source-id: 5cf2dfbf886de993242db28cdac5d0c5fadbdc4d
2018-07-26 12:09:06 -07:00
0c84a5c27e Pass shape infos to ONNX -> Caffe2 C++ conversion backend (#9870)
Summary:
And let Gemm conversion to inspect the input `C` to try converting to FC.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9870

Reviewed By: houseroad

Differential Revision: D9013198

Pulled By: bddppq

fbshipit-source-id: b4c509cfccca238262e1c406b004e66cef256321
2018-07-26 12:00:32 -07:00
e39c8043dc Make GraphExecutors work on Stacks instead of variable_tensor_lists (#9763)
Summary:
This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763

Reviewed By: zou3519

Differential Revision: D8978457

Pulled By: apaszke

fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20
2018-07-26 12:00:27 -07:00
6f10944f88 Re-enable rocm tests that have been fixed in rocm 1.8.2 (#9862)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9862

Differential Revision: D9012520

Pulled By: bddppq

fbshipit-source-id: cdcc184e23befa8dbd1bc44d59bd25766aac33d0
2018-07-26 10:54:57 -07:00
716f7d657d Remove Broadcast.py. (#9843)
Summary:
I don't think this file is used anywhere, I guess we'll find out!

(Weirdly this failed lint on one of my PRs even though it shouldn't).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9843

Differential Revision: D9003949

Pulled By: gchanan

fbshipit-source-id: 26d580d1e7cdd30e82e5f4176244e51fd7cd616d
2018-07-26 10:44:24 -07:00
cd5adc7b5f Remove template parameter from Tensor (#13)
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: xw285cornell

Differential Revision: D8121878

fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
2018-07-26 10:25:23 -07:00
2c7e7e37a6 Corrected doc in class RNNCell (#9866)
Summary:
fixes #9642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9866

Differential Revision: D9012131

Pulled By: weiyangfb

fbshipit-source-id: d2849b1a50234dbdb335dffab4835c9de85183c3
2018-07-26 09:27:05 -07:00
bdbbcf068a Temporarily disable test_unique on rocm since it keeps running into segfault (#9872)
Summary:
petrex

https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872

Reviewed By: ezyang

Differential Revision: D9013335

Pulled By: bddppq

fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e
2018-07-26 08:34:00 -07:00
e70fc145a9 MIOpen fixes for Caffe2 (#9842)
Summary:
The PR contains:
Fixes for running MIOpen conv operator in a multi worker scenario, along with a performance fix
Fixing a typo in MIOpen pool op and adding some extra checks for MIOpen spatial BN op

bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9842

Differential Revision: D9012512

Pulled By: bddppq

fbshipit-source-id: 270e1323c20fbfbc4b725f9a4ff34cd073ddaaa8
2018-07-26 02:42:26 -07:00
3be8e4db51 Do not run ONNX integration tests in parallel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9861

Differential Revision: D9011458

Pulled By: bddppq

fbshipit-source-id: 7ab1b1763d56f1290ade7a99682ad461c97f807b
2018-07-25 21:54:29 -07:00
997f46d1e1 Disable "filter too much" health check for fc operator tests (#9865)
Summary:
makes the CI flaky
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9865

Differential Revision: D9011882

Pulled By: bddppq

fbshipit-source-id: 5124ab97d258eed7585734d64fb01e5df98abd0d
2018-07-25 21:41:14 -07:00
ba062e7da9 Update OnnxifiOp according to onnx/onnx#1224
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9844

Reviewed By: yinghai

Differential Revision: D9004222

Pulled By: bddppq

fbshipit-source-id: 1bdcefc0dfbd5e3422217b5254b2462e5a568d2a
2018-07-25 19:29:38 -07:00
5e4de0821a Set ROCm MAX_JOBS=4 (#9856)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9856

Differential Revision: D9009100

Pulled By: ezyang

fbshipit-source-id: 28f34128fcb7c3d6a115884bf28dc2a6bde5aed6
2018-07-25 19:09:41 -07:00
6cd0174ff5 Reimplement localScalar as a native function. (#9762)
Summary:
I split it into two parts, _local_scalar and _local_scalar_dense (unchecked)
so I could reuse the sparse logic in both paths.

_local_scalar became a method on Tensor to work around a circular
include problem.

This is resurrected copy of #9652
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9762

Differential Revision: D8972348

Pulled By: ezyang

fbshipit-source-id: 2232dbfc8e1286b8a4a1c67d285c13a7771aad4c
2018-07-25 19:09:39 -07:00
ad47228020 Test pinning Hypothesis 3.59.0 (#9830)
Summary:
We think this will band-aid some of the new Caffe2 test failures.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9830

Differential Revision: D9008052

Pulled By: ezyang

fbshipit-source-id: 84f1c0faea429d758d760965d6cbfe9e4c72eb19
2018-07-25 18:11:10 -07:00
b84b78a69d Fix the ROCM build, and enable sccache for it
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9841

Differential Revision: D9008030

Pulled By: ezyang

fbshipit-source-id: 51cac3c75fc52658b22a10a6bf8a479bcf803fb2
2018-07-25 17:55:47 -07:00
0b16b03b98 Plumb type annotations through script compilation (new) (#9547)
Summary:
Supersedes https://github.com/pytorch/pytorch/pull/9405
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9547

Reviewed By: zdevito

Differential Revision: D8900327

Pulled By: jamesr66a

fbshipit-source-id: a00a94615af4fbaec98ee3ede0cb54bcfd9108dd
2018-07-25 17:10:14 -07:00
445c17d492 Update CopyMatrix in math (#9792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9792

Update CopyMatrix in math

Reviewed By: houseroad

Differential Revision: D8982421

fbshipit-source-id: da2056306cde3300124b21eba7a6c2d113111002
2018-07-25 16:10:52 -07:00
74ac5265d1 nomnigraph - make use of nodeIterator (#9831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9831

Follow up to D8980903 - replace dataIterator with nodeIterator where the data isn't used.

Reviewed By: pjh5

Differential Revision: D8998351

fbshipit-source-id: c333847ecd8b6d8075352322845839b94a63aecc
2018-07-25 15:40:44 -07:00
302adb7cc8 added torch.rot90() to ATen (#8628)
Summary:
1. fixes #6271
2. implemented torch.rot90() following [numpy.rot90()](6a58e25703/numpy/lib/function_base.py (L54-L138))
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8628

Reviewed By: ezyang

Differential Revision: D8987860

Pulled By: weiyangfb

fbshipit-source-id: 8dac3b2a1f6d3288672977aba8b547706ce97fe9
2018-07-25 15:11:44 -07:00
2f5c0c30cd Make logsumexp work with empty tensors again. (#9825)
Summary:
https://github.com/pytorch/pytorch/pull/9755 broke this, but it was only tested if size zero dims were turned on (it can still happen even if that isn't turned on, because we support size [0] tensors).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9825

Differential Revision: D8997303

Pulled By: gchanan

fbshipit-source-id: 911dce112f73fad0f3980a7f4f9423df0f2d923d
2018-07-25 13:41:24 -07:00
4b0098f3ae Add --allow-change-held-packages to make nccl2 install in docker work (#9828)
Summary:
This was used to build Caffe2 Docker version 170.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9828

Differential Revision: D8997808

Pulled By: ezyang

fbshipit-source-id: f48938b2b71bc86578c9d9b46c281ed05478724e
2018-07-25 11:56:40 -07:00
279b836675 Add some user-friendly checks in pack padded symbolic to ensure thing… (#9731)
Summary:
…s are the right type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9731

Reviewed By: soumith

Differential Revision: D8958693

Pulled By: jamesr66a

fbshipit-source-id: 7db1f86a85188fd2c84d0edaaaac6a096d64ba52
2018-07-25 11:25:42 -07:00
be163f50a3 Avoid divide-by-zero when bartlett_window size is 0.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9788

Differential Revision: D8980951

Pulled By: gchanan

fbshipit-source-id: 429b341ac687afe4f1429bb141ef070bf315519c
2018-07-25 10:40:39 -07:00
56fbfee872 Remove ifdef __cplusplus from THTensor.h, have cpp self-contained in … (#9775)
Summary:
…THTensor.hpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9775

Differential Revision: D8977140

Pulled By: gchanan

fbshipit-source-id: d6d2461f7cb0511ee1def52ac1032a86349a7105
2018-07-25 10:25:17 -07:00
a7f183f971 Revert "Fix dataloader hang when it is not completely iterated (#9655)" (#9804)
Summary:
This reverts commit 9ee513365121cd387e11987c66db6599ac53ded7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804

Reviewed By: ezyang

Differential Revision: D8987780

Pulled By: SsnL

fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5
2018-07-25 10:10:30 -07:00
c14e17eced Co-disitillation with different archs and/or feature set (#9793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9793

Enable co-distillation with different archs

Reviewed By: pjh5

Differential Revision: D8888479

fbshipit-source-id: eac14d3d9bb6d8e7362bc91e8200bab237d86754
2018-07-25 10:10:27 -07:00
ea67a2bd11 Allows negative index to tensor.narrow (Fixes: #9546)
Summary:
Fixes #9546
Test cases added

Reviewed By: ezyang

Differential Revision: D8974842

Pulled By: zou3519

fbshipit-source-id: a7707406c2a21e8e14f9c2a8ad4d64c8b08156df
2018-07-25 09:25:45 -07:00
0853d13f86 Move scalar boolean to THTensor, rename scalar in this context to zer… (#9783)
Summary:
…o dim.

Manifest:
1) The scalar boolean is now in THTensor, although it isn't hooked up at the TH level yet.
2) setScalar is gone, everything now goes through the maybeScalar equivalent (which is renamed)
3) all "scalars" in this context now refer to "zero_dim" in order to differentiate this concept from the "Scalar" class.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9783

Differential Revision: D8978911

Pulled By: gchanan

fbshipit-source-id: f09254be4bebad0e4c510fefe4158b4f7e92efe1
2018-07-25 09:25:41 -07:00
8825e323b5 nomnigraph - Add way to check if a NodeRef is in a graph, and make a graph node iterator (#9790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9790

- Add way to check if a NodeRef is in a graph
- Make a nodeIterator (similar to dataIterator) but only iterate through nodes.

Reviewed By: bwasti

Differential Revision: D8980903

fbshipit-source-id: b20504a46715858752e25242303125a15a709b88
2018-07-25 09:02:13 -07:00
42a4747389 Temporarily need this to prevent sccache from breaking. (#9810)
Summary:
Temporarily need this to prevent sccache from breaking when I move sccache install to the DockerFile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9810

Differential Revision: D8991684

Pulled By: Jorghi12

fbshipit-source-id: 14cd0278f53a72372f9bbe27b228980f8d3c1d4a
2018-07-25 09:01:58 -07:00
a74a3fdeb6 typo fix, tutorials url with http protocol is not valid (#9812)
Summary:
The tutorials url with http is not valid, replacing it with https.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9812

Differential Revision: D8991344

Pulled By: ezyang

fbshipit-source-id: c12faa57905b50eadc320f9938c39c4139bd093b
2018-07-25 07:54:26 -07:00
3ef521e98a Implement backward for torch.symeig (#8586)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/6890. (backward pass for non-symmetric eigen-decomposition is not implemented in other packages, e.g. autograd, mxnet, tensorflow, presumably because the eigenvalues can be imaginary for the general case, and AFAIK we cannot support complex numbers).

This patch adds a backward function for the symmetric eigen-decomposition function `torch.symeig`. The formula used is taken from [here](http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf). Unit tests are added to verify correctness.

There is still one outstanding issue, which is how to handle the case where the `symeig` is called with `eigenvectors=False`. In this case, the eigenvectors are returned as a zero tensor, but the backward computation for the eigenvalues depends on the eigenvectors. There was a previous attempt to implement this in https://github.com/pytorch/pytorch/pull/2026, where apaszke mentioned that the `eigenvectors` argument should be overridden so that they are saved for the backwards pass. The forward code is autogenerated, though, and it isn't clear to me how that would be done. I'd appreciate any guidance. For now, there is a unit test that will fail until that issue is resolved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8586

Reviewed By: ezyang

Differential Revision: D8872760

Pulled By: SsnL

fbshipit-source-id: 76614495d0f9c118fec163a428f32e5480b4d115
2018-07-25 07:16:10 -07:00
0262fd0f91 Delete Tensor::typeString() (#9764)
Summary:
The primary use-site of typeString was checked_cast_tensor.
I did a little more than I needed in this patch, to set
the stage for actually deleting the tensor type.

Specifically, I modified checked_cast_tensor to explicitly
take Backend and ScalarType, the idea being that once we
remove the tensor subclasses, we will delete the T template
parameter.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9764

Differential Revision: D8969196

Pulled By: ezyang

fbshipit-source-id: 9de92b974b2c28f12ddad13429917515810f24c6
2018-07-24 22:26:15 -07:00
723a600ebd Update for new incremental build instructions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9773

Differential Revision: D8988285

Pulled By: ezyang

fbshipit-source-id: c2c3b7cefb54e4e18602b180281f22939293a383
2018-07-24 22:26:13 -07:00
bca10ad706 Implementation of Weibull distribution (#9454)
Summary:
This implements the two-parameter Weibull distribution, with scale $\lambda$ and shape $k$ parameters as described on [Wikipedia](https://en.wikipedia.org/wiki/Weibull_distribution).

**Details**
- We implement as a transformed exponential distribution, as described [here](https://en.wikipedia.org/wiki/Weibull_distribution#Related_distributions).
- The `weibull_min` variance function in scipy does not yet support a vector of distributions, so our unit test uses a scalar distribution instead of a vector.

Example of the bug:

```
>>> sp.stats.expon(np.array([0.5, 1, 2])).var() # fine
array([1., 1., 1.])
>>> sp.stats.weibull_min(c=np.array([0.5, 1, 2])).var() # buggy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 490, in var
    return self.dist.var(*self.args, **self.kwds)
  File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1242, in var
    res = self.stats(*args, **kwds)
  File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1038, in stats
    if np.isinf(mu):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9454

Differential Revision: D8863574

Pulled By: SsnL

fbshipit-source-id: 1ad3e175b469eee2b6af98e7b379ea170d3d9787
2018-07-24 20:40:15 -07:00
4b61760738 Add Adadelta optimizer to caffe2 (#9088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088

Closes https://github.com/pytorch/pytorch/pull/9088

- Added CPU/GPU implementations of Adadelta and SparseAdadelta.
- Added corresponding Python unittests

Reviewed By: BIT-silence

Differential Revision: D8712169

fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be
2018-07-24 20:09:21 -07:00
620952117e remove unnecessary -Wno= flags
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9608

Differential Revision: D8946664

Pulled By: anderspapitto

fbshipit-source-id: b05f10af58da25b2a2588f7153f393bb3637f29a
2018-07-24 18:40:42 -07:00
9cf76cfb4c Chaning conda build script to use current python version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9780

Reviewed By: ml7

Differential Revision: D8983501

Pulled By: pjh5

fbshipit-source-id: 79208796247433cbe271a2d06f66254587d96f80
2018-07-24 18:40:40 -07:00
f62bc01dfe Remove TORCH_ASSERT (#9575)
Summary:
I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former.

ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575

Differential Revision: D8924566

Pulled By: goldsborough

fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441
2018-07-24 18:10:06 -07:00
d2610fb379 Constexpr Type Ids -> 6.5% caffe2 perf improvement (#9603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9603

Using constexpr for some heavily queried type ids gives us a 6.5% perf improvement for caffe2.

Benchmark results: P59829647

Also ad canaries (but they don't show a significant difference):
- adfinder:
  - https://our.intern.facebook.com/intern/ads/canary/411346509423301481
  - https://our.intern.facebook.com/intern/ads/canary/411346563021753557
- adindexer:
  - https://our.intern.facebook.com/intern/ads/canary/411346517006038367
  - https://our.intern.facebook.com/intern/ads/canary/411346571387258927
- multifeed_predictor:
  - https://our.intern.facebook.com/intern/ads/canary/411346526631282941
  - https://our.intern.facebook.com/intern/ads/canary/411346583141009531

Reviewed By: dzhulgakov

Differential Revision: D8841577

fbshipit-source-id: 1a0ce7f2bee1ae54b723caefe5bc7f85a20935b4
2018-07-24 17:24:55 -07:00
6c6a353a66 Fix speedbenchmark bug (#9770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9770

Add zero ops to operators that do not have a valid schema

Reviewed By: hlu1

Differential Revision: D8957472

fbshipit-source-id: d8d0a351183e88ace2e050a87c1e1c363af67e33
2018-07-24 17:10:37 -07:00
d7d673b68d Updata onnx to lastest master (#9782)
Summary:
52d40befa7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9782

Reviewed By: yinghai, houseroad

Differential Revision: D8978668

Pulled By: bddppq

fbshipit-source-id: 238f76a36784c12cc5655a2ee059f7e0169c0bb6
2018-07-24 14:42:01 -07:00
e5fe66d7ea Add support for specifying device_option in Functional (#9619)
Summary:
e.g.
```
Functional.Add(x, y, device_option=DeviceOption(HIP, 0))

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9619

Differential Revision: D8966599

Pulled By: bddppq

fbshipit-source-id: 22235e42f19278e79802642798bf0ee70a1202f6
2018-07-24 14:41:59 -07:00
37fc58f1d3 Use torch::empty before random_ on seed gen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9769

Reviewed By: goldsborough

Differential Revision: D8977636

Pulled By: SsnL

fbshipit-source-id: c2437d5ef53dc74e1b17eb16e728e1d67ae314c7
2018-07-24 14:41:58 -07:00
f393df774b Test case for c10d DDP (#9670)
Summary:
Before I can rewrite portions of the c10d DDP in C++ I need proper tests in place to make sure I am not breaking anything as I port code. There were no tests for the c10d DDP in place so I wrote some.

I refactored the c10d tests to derive some tests cases from a general `MultiGPUTestCase` and followed lots of patterns from `test_distributed.py` w.r.t. how tests are skipped (such that the main process doesn't initialize CUDA, which I found is a super important detail!!!).

I am largely unfamiliar with this code so feel free to scrutinize. The DDP test code itself is also largely taken from `test_distributed.py` but more inlined which I find easier to read.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9670

Differential Revision: D8977724

Pulled By: goldsborough

fbshipit-source-id: 186eab38a72384d7992a2ec5c89f304ad42d5944
2018-07-24 14:10:24 -07:00
e26d584445 Remove isScalar() from TensorImpl.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9765

Differential Revision: D8969474

Pulled By: gchanan

fbshipit-source-id: 42002b129488179affc919dba877de5a4e8f9fb5
2018-07-24 12:55:06 -07:00
7050d83dd7 Make logsumexp_out inplace (#9755)
Summary:
Fixes: #9754

Maybe this could also make its way into 0.4.1, it  is a severe debugging headache if you hit this...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9755

Reviewed By: ezyang

Differential Revision: D8967178

Pulled By: zou3519

fbshipit-source-id: 151ed24e3a15a0c67014e411ac808fb893929a42
2018-07-24 12:40:48 -07:00
360c1bbd5b Add multivariate log-gamma (mvlgamma) (#9451)
Summary:
1. Add tests in test_cuda, test_torch
2. Add doc strings

Closes https://github.com/pytorch/pytorch/issues/9378 .

Differential Revision: D8859746

Pulled By: ezyang

fbshipit-source-id: 939c309d90940a7aa08f53004c9e7b3b1c9cf54e
2018-07-24 12:10:10 -07:00
6885b3fd62 Delete dead IsVariable enum. (#9768)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9768

Differential Revision: D8975802

Pulled By: ezyang

fbshipit-source-id: f85844872a1eb13e782aba0c168a3a1c1ac0313d
2018-07-24 11:58:11 -07:00
f9a99d5504 Specify default initialization schemes for modules in docs (#9038)
Summary: This closes #6906 .

Reviewed By: ezyang

Differential Revision: D8698632

Pulled By: weiyangfb

fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48
2018-07-24 11:58:08 -07:00
2b134c72e6 Add interface to provide blob types to shape&type inference (#9643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643

Current map interface assumes float data type, which is not always correct.

Reviewed By: kennyhorror

Differential Revision: D8455784

fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10
2018-07-24 11:58:05 -07:00
7af5883860 Eanble python tests on ROCM (#9616)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616

Differential Revision: D8960623

Pulled By: bddppq

fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993
2018-07-24 11:37:58 -07:00
6ab5e697b9 Small fixups for enabling zero size dims. (#9724)
Summary:
1) Properly test cpu for alpha/beta addmm cases.
2) Unsqueeze on empty no longer throws an exception.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9724

Reviewed By: ezyang

Differential Revision: D8958513

Pulled By: gchanan

fbshipit-source-id: 6ce2ec4a47201f9b225b8c52354144ace43e9e09
2018-07-24 11:11:39 -07:00
675d80841a Small fixups for n-dimensional empty tensors in CUDA non-reduction di… (#9722)
Summary:
…m ops.

Continuation of https://github.com/pytorch/pytorch/pull/9658.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9722

Differential Revision: D8956321

Pulled By: gchanan

fbshipit-source-id: 116fcaa1be5b1373f03217911556a28125cc860d
2018-07-24 11:11:37 -07:00
f6496229a5 Fixes xcode 10 beta 4 compile error (#9748)
Summary:
When building iOS apps with a caffe2 dependency, we were seeing the `caffe2/caffe2/mobile/contrib/ios/mpscnn/mpscnn.mm:33:17: error: method 'copyWithZone:' in protocol 'NSCopying' not implemented [-Werror,-Wprotocol]`. This fixes it by implementing a shallow copy with that method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9748

Reviewed By: jerryzh168

Differential Revision: D8954332

Pulled By: williamtwilson

fbshipit-source-id: 0cd44408257c0bd3f4ffb80312ea9d13d13e5ff3
2018-07-24 11:11:35 -07:00
1283834600 Devirtualize TensorImpl::toString (#9758)
Summary:
This can hardly be called an improvement (we now print
CPUFloatType instead of CPUFloatTensor) but it was the
simplest way I could think of devirtualizing this function in
the short term.  Probably need some sort of native function
that gives string information about a tensor.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Approved in #9710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9758

Differential Revision: D8966935

Pulled By: ezyang

fbshipit-source-id: a4641affe0a6153f90cdd9f4f2a1100e46d1a2db
2018-07-24 11:11:33 -07:00
679d397f28 Fix scalar_tensor_test for squeeze/unsqueeze with zero sized dimensions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9766

Differential Revision: D8971173

Pulled By: gchanan

fbshipit-source-id: 50bf7778eee7c60f51e1660ad834e161fa40f563
2018-07-24 10:42:39 -07:00
a7afba7308 Remove duplicated functions (#9601)
Summary:
found by linter, duplication was likely introduced in previous code sync
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9601

Differential Revision: D8922379

Pulled By: bddppq

fbshipit-source-id: 1f61bd7f539d823e62920615674a532ec0149623
2018-07-24 10:23:46 -07:00
adda789770 Skip maxpool_with_indices onnx tests (#9751)
Summary:
Not in the same format. Skip at the moment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9751

Reviewed By: yinghai

Differential Revision: D8965636

Pulled By: houseroad

fbshipit-source-id: 81d39c2f5625c14c0e1ee11408b5f7267b53798f
2018-07-24 10:23:43 -07:00
ba634c11df Move strides to base class. (#9749)
Summary:
Approved in #9644
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9749

Differential Revision: D8965336

Pulled By: ezyang

fbshipit-source-id: d1b0763e592f298395621cfd684715dc0a550cd6
2018-07-23 22:27:48 -07:00
9bf72b2087 Add missing windows exports
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9738

Reviewed By: apaszke

Differential Revision: D8961728

Pulled By: zdevito

fbshipit-source-id: aacba8c03d0d8dfe1e87585d1c2b26703d2ed103
2018-07-23 19:55:19 -07:00
5df3eae89e Add 1x1 specialization for conv with NCHW order (#9671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671

Add 1x1 specialization for conv with NCHW order

Reviewed By: houseroad

Differential Revision: D8944686

fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c
2018-07-23 18:54:58 -07:00
a387331e54 Re-enable test_segfault after recent dataloder changes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9700

Differential Revision: D8953615

Pulled By: SsnL

fbshipit-source-id: c6aa3c07dd2857dd54889d47e537a6b1e9198c60
2018-07-23 18:38:42 -07:00
099b5ba9d1 Tensor merge PRs from July 20 (#9713)
Summary:
Constituent PRs:

- [x] #9553 Remove unnecessary functions from StorageDerived.h (by cpuhrsch, reviewed by ezyang)
- [x] #9588 Use THTensor/Storage for THVoidTensor/Storage (by cpuhrsch , reviewed by gchanan)
- [x] #9627 Delete context from tensor (by ezyang, reviewed by gchanan)
- [x] #9641 Tensor reorganization (by ezyang, reviewed by gchanan )
- [x] #9647 Remove dim_ from THTensor (by cpuhrsch, reviewed by ezyang)
- [x] #9650 Remove context (by cpuhrsch, reviewed by gchanan and ezyang)
- [x] #9715 Fix Windows build in tensor merge PR (by ezyang, reviewed by gchanan and SsnL)

Upcoming PRs which didn't make this cut:

- [x] #9644 Stride move to TensorImpl, and nits (by ezyang, reviewed by gchanan)
- [ ] #9652 Native localScalar  (by ezyang, **UNREVIEWED AND FAILING TESTS**)
- [x] #9710 Devirtualize TensorImpl::toString (by ezyang, reviewed by gchanan)
- [ ] #9654 Use int64_t instead of ptrdiff_t for size / Rename flag to resizable_  (by cpuhrsch, **CHANGES REQUESTED AND FAILING TESTS**)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9713

Reviewed By: gchanan

Differential Revision: D8960882

Pulled By: ezyang

fbshipit-source-id: 99747b2c5462c7ff6809b67aacb4197626408204
2018-07-23 18:00:41 -07:00
e3fb9088d5 Allow multiple ops.def and clean up code gen in general
Summary: Basic cleanup, refactoring out some ops to closed source fb

Reviewed By: yinghai

Differential Revision: D8720722

fbshipit-source-id: 6fdf915c057a5749656d9f34a57fc142de6b076b
2018-07-23 15:44:04 -07:00
5849354aa1 Add operator<< overloads for TensorOptions (#9606)
Summary:
Added `operator<<` overloads for `at::TensorOptions` on request of ebetica

Example output:

```
TensorOptions(dtype=Double, device=cpu, layout=Strided, requires_grad=false)
```

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9606

Differential Revision: D8925191

Pulled By: goldsborough

fbshipit-source-id: 0503bc2851268276e9561d918290bc723e437c9c
2018-07-23 15:11:33 -07:00
d05a8145c5 Change behavior of clone to clone to a device (#9609)
Summary:
ebetica made me aware that `nn::Module::clone()` always clones to the current device (usually CPU) instead of preserving the device of each parameter. This PR changes the signature of `clone` from

`shared_ptr<Module> clone()`

to

`shared_ptr<Module> clone(optional<Device> device = nullopt)`

with semantics of:

1. If a `device` is given, all parameters/buffers are moved to that device,
2. If no `device` is supplied (default), parameters/buffers retain their device.

ezyang apaszke ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9609

Differential Revision: D8957367

Pulled By: goldsborough

fbshipit-source-id: 0d409ae645ed2b8d97d6fc060240de2f3d4bc6c8
2018-07-23 14:55:25 -07:00
31ba2f15e1 Rename embedding variable to weight (#9720)
Summary:
I renamed the variable in the `Embedding` module from `weight` to `table` a few months ago, because it seemed like a more meaningful name. Turns out it's not such a good idea because it deviates from PyTorch, which unnecessarily breaks C++->Python translated code.

ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9720

Differential Revision: D8955647

Pulled By: goldsborough

fbshipit-source-id: 77228b07d2b733866e8cdecaa6d0686eef4cc3ea
2018-07-23 14:55:24 -07:00
431415adc4 quick patch for PackPadded removal to propagate the correct size. (#9657)
Summary:
The underlying reason why this is even an issue is that the conversion
into and out of the 'fictional' onnx operators is done in an unhygenic
order. This doesn't address that, but it does fix the one observable
case where this produces an incorrect result, and unblocks some other
work being done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9657

Differential Revision: D8940824

Pulled By: anderspapitto

fbshipit-source-id: ea827a24c85447fe4ae470336a746329598eee84
2018-07-23 14:25:39 -07:00
a949245a86 Switch interpreter to use IValue's primitive int/floats (#9718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9718

This patch switches the interpreter to use IValue's primitive numbers rather than tensors for computing on integers and floats. In addition to preparing the interpreter for first-class support of other types, this cleans up the handling of primitive numbers, making it possible to just use the normal operator overloading dispatch to find the right implementation for numbers. As a result of this change, a lot of other functionality needed to be updated since it was the first time we use non-tensors in a lot of places in the code base.

Notes:
* Fixes code_template.py so that multi-line strings are indented correctly when used on a standalone line
* Cast operators (`int(x)`) now are functional. Some tests have addition conversions to integers because
we no longer allow implicit tensor -> integer conversions following the same convention as in python
* prim::ListConstruct/createList has been added to the interpreter for creating lists and this has
replaced aten::stack for integers lists
* gen_jit_dispatch.py has been refactored so that non-tensor types use operators on IValues to extract
the primitives
* IValue gains a .to<T> method that is the equivalent of tensor_as but for IValue instead of at::Tensor
* `constant_as<T>` is switched over to using IValues's `.to<T>` method, to make conversion from constant->IValue->C++ type
more consistent. This functionality combined with `toIValue(Value*)` replaces the `tensor_as` and `as_tensor` family of functions.
* conditional expressions (if, loop) and operators related to them are now computed on integers rather than tensors
* IValue gains constructors for constructing from at::Scalar and converting to it. However, IValue itself will always store
the scalars as a double or int64.
* To align with python 3 syntax, TK_INT, TK_FLOAT, and TK_BOOL have been removed from the parser, and int/float/bool are just treated as special identifiers in the compiler,
along with print. These are represented as special sugared values with a `call` method implemented. For int/float/bool this implements casting behavior.
* Dropped shared_from_this from Type/Module. They were not needed and they making debugging harder because they internally throw/catch exceptions.
* Shape propagation has been updated to support running nodes that include floating point primitive types, this required some refactoring of internal functions.
* TensorToNum and NumToTensor have actual implementations as operators now
* regster_prim_ops now contains implementations of math operators for float/int primitive types, and for mixed (prim <+> tensor) versions. This removes the need for special handling in compiler.cpp
* Primitive math is now entirely handled by letting the compiler choose the right overloads. This removes tons of special casing in the compiler.
* incorporates eellison's change to allow casting from return values. Due to the addition of primitive support, the code need slight modifications, so I just pre-merged it here.
* stack.h gains generic vararg versions of push/pop that know how to convert to/from C++ types:

```
at::Tensor a;
at::Scalar b;
pop(stack, a, b);
at::Tensor c = a + b;
push(stack, c);
```
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9584

Reviewed By: apaszke

Differential Revision: D8910546

Pulled By: zdevito

fbshipit-source-id: 0f3e60d4d22217f196a8f606549430e43b7e7e30
2018-07-23 14:11:11 -07:00
a9742e1a27 Add fallback to TensorCPU if there are unsupported types for IDEEP Tensor (#9667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9667

MKL-DNN doesn't support 64-bit integger (cfee61bf81/include/mkldnn_types.h (L62-L75)). So force converting from `TensorCPU<long>` to `s32` Ideep tensor will cause memory issue. This diff gives an alternative solution, where we just fall through to TensorCPU. The reasoning is that since MKL-DNN doesn't support 64 bit integer tensor, downstream ops have to be in CPUConext. So there is no reason force converting to ideep tensor and back.

Reviewed By: pjh5

Differential Revision: D8943544

fbshipit-source-id: f514903cda27e34b8887271c9df56c8220895116
2018-07-23 13:54:57 -07:00
ee2cc68259 Add ctc_beam_search_decoder op for caffe2 (#9622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9622

Implement a ctc_beam_sarch_decoder operator based on ctc_greedy_decoder.

Differential Revision: D8903100

fbshipit-source-id: 38973632cb437e5cfcb9ed3a48ed6b901c10efa3
2018-07-23 13:40:24 -07:00
aa8a9fa5fc Extend DispatchStub to support CUDA dispatch (#9664)
Summary:
This is a modification of the strategy from https://github.com/pytorch/pytorch/pull/8919 and https://github.com/pytorch/pytorch/pull/9579.

```
Previously, the CPU architecture-specific kernels self-registered with
the DispatchStub. When linking as part of a static library, this requires
the flag --whole-archive to be passed to the linker to ensure that the
object files for the kernels are included. Caffe2 and TensorFlow use that
strategy.

We ran into some issues with --whole-archive blowing up the binary size
of some downstream projects in Facebook. This PR avoids --whole-archive
for CPU kernels. The downside is that the generic code needs to be aware
of whether kernels are compiled with AVX and with AVX2 (via
HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION).

The CUDA kernels still self-register with DispatchStub because the CPU
library is not aware of whether the CUDA library will be available at
runtime.

There are a few major changes to DispatchStub

 - The environment variable ATEN_CPU_CAPABILITY overrides the CPU
   capability detection code (Previous ATEN_DISABLE_AVX/AVX2)

 - DispatchStub is defined in the generic native code instead of the
   CPU_CAPABILITY_DEFAULT kernel.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9664

Differential Revision: D8943350

Pulled By: colesbury

fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef
2018-07-23 13:40:23 -07:00
3e9e3ef383 Improving diagnose RF NE with Cali (#9550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9550

as titled

Differential Revision: D8899226

fbshipit-source-id: 3c7cf026e8cbc0e95770e5a35b213a97bebba385
2018-07-23 13:40:21 -07:00
88d6b6e6cd Fix D8722560 (#9717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9717

D8722560 was landed with some build errors, unfortunately the c10 code isn't part of contbuild yet.
Fixing them.

Differential Revision: D8954141

fbshipit-source-id: 2a082fb8041626e45ccd609f37a8ef807f6dad8a
2018-07-23 12:55:20 -07:00
5094684238 Create torch::from_blob for variables (#9605)
Summary:
Need an overload of `at::from_blob` for Variables.

ezyang colesbury ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9605

Differential Revision: D8926226

Pulled By: goldsborough

fbshipit-source-id: e377c0d019d4377f3fc124614c7dcc562aa69990
2018-07-23 12:40:12 -07:00
14d4bdb406 Reformat output data format to make it more general for other binaries (#9555)
Summary:
This is to simplify the data format during benchmarking. After this change, we can use the same benchmarking harness data conversion method to parse data from multiple binaries.

This change should be coordinated with the PR: https://github.com/facebook/FAI-PEP/pull/63
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9555

Reviewed By: pjh5

Differential Revision: D8903024

Pulled By: sf-wind

fbshipit-source-id: 61cabcff99f0873729142ec6cb6dc230c685d13a
2018-07-23 11:11:26 -07:00
029cf1d78a Improve error messages of wrong dimensions (#9694)
Summary:
Updated the error message terms _matrices_ and _vectors_ to _2D tensors_ and _1D tensors_ respectively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9694

Differential Revision: D8949589

Pulled By: ezyang

fbshipit-source-id: 2cdcd72e0e9a4459f3691c133bb16ef218b5cf3f
2018-07-23 10:10:55 -07:00
9525925119 Low rank multivariate normal (#8635)
Summary:
This pull request implements low rank multivariate normal distribution where the covariance matrix has the from `W @ W.T + D`. Here D is a diagonal matrix, W has shape n x m where m << n. It used "matrix determinant lemma" and "Woodbury matrix identity" to save computational cost.

During the way, I also revise MultivariateNormal distribution a bit. Here are other changes:
+ `torch.trtrs` works with cuda tensor. So I tried to use it instead of `torch.inverse`.
+ Use `torch.matmul` instead of `torch.bmm` in `_batch_mv`. The former is faster and simpler.
+ Use `torch.diagonal` for `_batch_diag`
+ Reimplement `_batch_mahalanobis` based on `_batch_trtrs_lower`.
+ Use trtrs to compute term2 of KL.
+ `variance` relies on `scale_tril` instead of `covariance_matrix`

TODO:
- [x] Resolve the fail at `_gradcheck_log_prob`
- [x] Add test for KL

cc fritzo stepelu apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8635

Differential Revision: D8951893

Pulled By: ezyang

fbshipit-source-id: 488ee3db6071150c33a1fb6624f3cfd9b52760c3
2018-07-23 10:10:53 -07:00
9d6521c3a0 Support n-dimensional empty tensors in CUDA non-reduction dimension f… (#9658)
Summary:
…unctions.

This also unifies the error checkign between scatter/scatterAdd on CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9658

Differential Revision: D8941527

Pulled By: gchanan

fbshipit-source-id: 750bbac568f607985088211887c4167b67be11ea
2018-07-23 08:40:12 -07:00
53083b8353 Remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and fix CUDA 8 build on Windows (#9491) (#9491)
Summary:
Fixes #9092.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9491
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9693

Differential Revision: D8946850

Pulled By: ezyang

fbshipit-source-id: bd816f459ab70f6b4a0983305a1ce341bb633707
2018-07-23 06:40:39 -07:00
9ee5133651 Fix dataloader hang when it is not completely iterated (#9655)
Summary:
second trial of https://github.com/pytorch/pytorch/pull/7140

cc csarofeen Let's see if this works. It passes everything locally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655

Differential Revision: D8940177

Pulled By: SsnL

fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158
2018-07-22 20:38:27 -07:00
1afdc57ed8 Hide all other fields in THTensor (#9683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9683

This pops off `refcount_`, `storage_`, `storage_offset_`; there are now no more direct accesses to these fields and we can make them private (with appropriate friending).

Stacked on #9561
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9591

Reviewed By: SsnL

Differential Revision: D8922246

Pulled By: ezyang

fbshipit-source-id: dfae023d790e29ce652e2eab9a1628bbe97b318d
2018-07-22 09:09:34 -07:00
f3d72b2101 Modify barrier net to allow better control over its initialization and execution in DPM (#9665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9665

In data_parallel_model, we isolate synchronizing barrier init net into its own from the param_init_net, so that we could have finer granularity of control over the barrier net.

Reviewed By: andrewwdye

Differential Revision: D8375389

fbshipit-source-id: ce0c8c1c8e4bd82b7078a1b07abaced3f149d578
2018-07-22 00:23:47 -07:00
769cb5a640 Add new ways of matching nodes with schemas in the JIT (#9567)
Summary:
**REVIEW LAST COMMIT ONLY**

As discussed in our yesterday's meeting. Nodes can be now matched to particular overloads using the `matches(...)` function:
```cpp
n->matches("aten::type_as(Tensor self, Tensor other) -> Tensor")
```

This also changes the shape prop and peephole passes to use those functions for matching. This fixes a few bugs, makes them much more robust, and prepares us for removal of attributes.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9567

Reviewed By: zdevito

Differential Revision: D8938482

Pulled By: apaszke

fbshipit-source-id: eb2382eeeae99692aada2d78d5d0c87c8ef1545e
2018-07-21 21:39:07 -07:00
a01d6f01b5 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525

Update channel_shuffle_op and transpose 2d to speed up ShuffleNet

Reviewed By: houseroad

Differential Revision: D8889361

fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e
2018-07-21 12:54:33 -07:00
3bb8c5eab1 Allow MKLDNN on macOS, and any other OS where CMake is able to detect it.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9638

Reviewed By: soumith

Differential Revision: D8946130

Pulled By: resistor

fbshipit-source-id: 87bd9cb12608467b05bd4998fdb00bfdbd038ca2
2018-07-20 22:27:02 -07:00
b5c8d59451 Add a CUDAContext header include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9662

Differential Revision: D8945581

Pulled By: ezyang

fbshipit-source-id: 2fe0adc96456788579f7d6f1c4513fe45360c030
2018-07-20 20:39:09 -07:00
23ed26a0c3 Guard include of cuda-only header comm.h (#9656)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9656

Reviewed By: colesbury

Differential Revision: D8941361

Pulled By: ezyang

fbshipit-source-id: c18cb0e606ae0608e5892040192b8792ae542b74
2018-07-20 19:46:36 -07:00
5e84403d5f Fix for half conversion for ROCm 1.8.2 (#9663)
Summary:
This PR contains the change for explicit conversion between ushort and __half required for ROCm 1.8.2 support
bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9663

Differential Revision: D8943937

Pulled By: bddppq

fbshipit-source-id: 16102f9dbc68ed4ece2e8fc244825c3992c24901
2018-07-20 17:11:30 -07:00
3efdece9da Support n-dimensional empty tensors in take/put.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9635

Differential Revision: D8935119

Pulled By: gchanan

fbshipit-source-id: 5035583e7322b1a1720d961945dd0eefb4cb28ef
2018-07-20 15:40:49 -07:00
45e5c17ecf ONNXIFI transform (#9569)
Summary:
Cut-off runnable subgraph and off-load to ONNXIFI backend
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9569

Reviewed By: Maratyszcza

Differential Revision: D8930408

Pulled By: yinghai

fbshipit-source-id: 2b494f7f8dc10c00e58cf0fed5c4a9434be6155b
2018-07-20 15:09:59 -07:00
01581037dc Add workspace.RunPlanInBackground (#9637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9637

Adding a method to run plan in background. The intended use is to run BlueWhale's data reading & preprocessing net in background while the GPU is training.

Reviewed By: MisterTea

Differential Revision: D8906439

fbshipit-source-id: b1c73ca7327e2d87a8f873924e05ab3d161a3f1e
2018-07-20 14:56:12 -07:00
1003ccfa15 Creates CUDAContext (#9435)
Summary:
ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

- Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
- Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with #9277 and I will merge with master after #9277 goes in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9435

Reviewed By: soumith

Differential Revision: D8917236

Pulled By: ezyang

fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751
2018-07-20 12:56:15 -07:00
8a0fe0a588 set_input_record() should always add external input (#9636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9636

Make sure that the blobs are registered to the net

Reviewed By: pjh5

Differential Revision: D8924883

fbshipit-source-id: f09422a2d4d5ba8bf6cfbfd00172097b5ab1fcd6
2018-07-20 11:55:37 -07:00
bae156a481 Support (some) CUDA Lapack on n-dimensional empty tensors.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9631

Reviewed By: ezyang

Differential Revision: D8933202

Pulled By: gchanan

fbshipit-source-id: 1ade4ca439bf26aa921df1da83a827d860f8f48f
2018-07-20 11:40:25 -07:00
d3688861ec Fixed a missing '=' in LPPoolNd repr function (#9629)
Summary:
In the repr funciton of LPPoolNd(..) class, there was a missing '='. (`kernel_size{kernel_size}`)

Link to line in the code: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/pooling.py#L694

Original:

       return 'norm_type={norm_type}, kernel_size{kernel_size}, stride={stride}, ' \
              'ceil_mode={ceil_mode}'.format(**self.__dict__)

Fixed:

       return 'norm_type={norm_type}, kernel_size={kernel_size}, stride={stride}, ' \
              'ceil_mode={ceil_mode}'.format(**self.__dict__)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9629

Differential Revision: D8932913

Pulled By: soumith

fbshipit-source-id: 9030dff6b14659b5c7b6992d87ef53ec8891f674
2018-07-20 11:24:42 -07:00
a3a6ab60cd Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598

The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp.

Reviewed By: jerryzh168

Differential Revision: D8919799

fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840
2018-07-20 11:09:34 -07:00
1d4d9fc7da Prepare to stop using attributes in the JIT (#9505)
Summary:
This PR adds machinery to cache the schema in an IR node, and allows lookups of (possibly) constant inputs by their names (instead of position). The new methods are:

- `at::optional<T> get<T>(Symbol name)` - if the argument called name is a constant, then casts it to type `T` and returns it. If it's not constant returns `nullopt`. Raises an error if there's no argument with that name.
- `at::optional<IValue> get<T>(Symbol name)` - like above, but packs the result in an IValue
- `Value* getValue(Symbol name)` - retrieves a `Value*` for an argument (no need to know its position).

All above functions currently inspect the attributes as well, but that's only so that I could start using them in other places in the JIT without disrupting our current functionality. I wanted this diff to be a preparation that doesn't change the semantics too much, and so both the tracer and script create nodes with attributes. The next PR will put that to a stop, and hopefully the changes we need to make to other components will be simpler thanks to what I did here.

One more thing I'd like to do before actually stopping creating the non-attributed nodes is to have a convenient way of creating a schema programmatically, matching nodes against it, and creating them without having to pack inputs into flat argument lists (which is quite error prone).

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9505

Reviewed By: ezyang

Differential Revision: D8915496

Pulled By: apaszke

fbshipit-source-id: 39d14fc9a9d73d8494f128367bf70357dbba83f5
2018-07-20 10:56:00 -07:00
b9e89cf9fd Revert "Extend DispatchStub to support CUDA dispatch (#9579)" (#9614)
Summary:
This reverts commit bcf0bf42a1727c8ee788f733c28579d0e36a387c.

The commit was causing issues for some internal FB projects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9614

Reviewed By: Yangqing

Differential Revision: D8929552

Pulled By: colesbury

fbshipit-source-id: ae9026ad8762a4c5de401273694b4c878fc241a6
2018-07-20 10:25:11 -07:00
bbb30ad4ab Use THTensor/Storage for THVoidTensor/Storage (#9588)
Summary:
Change akin to change for THVoidStorage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9588

Reviewed By: gchanan

Differential Revision: D8915559

Pulled By: cpuhrsch

fbshipit-source-id: 6cc69df0e29942c62750f990903dfd8e4d344581
2018-07-20 09:54:44 -07:00
f84fdc7866 Remove unnecessary functions from StorageDerived.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9553

Reviewed By: ezyang

Differential Revision: D8915526

Pulled By: cpuhrsch

fbshipit-source-id: 32013d3aa58a1a68637f99ee619d06e27fadaad6
2018-07-20 09:41:36 -07:00
7b9d8916e5 Fix integral type dispatch error message (#9625)
Summary:
This fix will prevent errors like (found in `bincount`)
```
RuntimeError: %s not implemented for '%s'bincounttorch.FloatTensor
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9625

Differential Revision: D8932945

Pulled By: soumith

fbshipit-source-id: 794e3b58d662779402ab318e274661826a5db8b2
2018-07-20 09:24:27 -07:00
2a0018f2a8 Add scatter_add_ doc (#9630)
Summary:
fixes #4176 cc vishwakftw

I didn't do `:math:` and `\neg` because I am using double ticks so they render more similarly with `:attr:`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9630

Differential Revision: D8933022

Pulled By: SsnL

fbshipit-source-id: 31d8551f415b624c2ff66b25d886f20789846508
2018-07-20 08:41:05 -07:00
bfe2aa093e docs fixes (#9607)
Summary:
fixes #9589 #9507 #9502 #9390
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9607

Reviewed By: ezyang, soumith

Differential Revision: D8923575

Pulled By: SsnL

fbshipit-source-id: cb61d990333b700d813ce781040c3d0325999b8c
2018-07-20 07:55:25 -07:00
4028ff6c3a Revert "quick patch for PackPadded removal to propagate the correct s… (#9613)
Summary:
…ize. (#9593)"

This reverts commit 85b28163584380bf4953f2ac2fa21df9715f12d5.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9613

Reviewed By: bddppq

Differential Revision: D8929322

Pulled By: anderspapitto

fbshipit-source-id: 3ae4d320e5407acc1fb63a26b7d1f2ff4059eba9
2018-07-20 00:39:29 -07:00
aa7af94656 Make JIT tracing a thread-local property (#9414)
Summary:
As in the title. Lets us simplify a lot of code.

Depends on #9363, so please review only the last commit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414

Reviewed By: zdevito

Differential Revision: D8836496

Pulled By: apaszke

fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3
2018-07-19 19:09:39 -07:00
5651b27458 Add CAFFE_STATIC_EVENT to Stats (#9501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9501

Added a new stat value to log static states like CPU and memory usage.

Reviewed By: pjh5

Differential Revision: D8872254

fbshipit-source-id: 469e94cab99029a3da55f8986dddeadac076e2a8
2018-07-19 16:25:59 -07:00
b770156a7a Functional DataParallel (#9234)
Summary:
This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend.

For this, I had to:
1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s,
2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++.

`replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice).

`parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py).

Added lots of tests for these things.

apaszke ezyang ebetica colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9234

Differential Revision: D8865182

Pulled By: goldsborough

fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4
2018-07-19 16:12:04 -07:00
7e78e80d94 Make error message for empty module friendlier (#9565)
Summary:
In our pimpl system, default constructing a module holder default constructs the contained module. This means `Linear linear;` is ill-formed, since `Linear` doesn't have a default constructor. Instead we require `Linear linear = nullptr;` to get the empty state of the `Linear`. This PR makes the error message for the ill-formed case nicer.

I had to change the forwarding constructors of most of our modules for this, but that's a minor adjustment.

E.g.

```
Linear linear;

In file included from /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:5:0,
                 from /home/psag/pytorch/pytorch/test/cpp/api/module.cpp:3:
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h: In instantiation of ‘torch::nn::ModuleHolder<Contained>::ModuleHolder() [with Contained = torch::nn::LinearImpl]’:
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/dropout.h:45:1:   required from here
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h:46:5: error: static assertion failed: You are trying to default construct a module which has no default constructor. Use = nullptr to give it the empty state (like an empt
y std::shared_ptr).
     static_assert(
```

ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9565

Differential Revision: D8903666

Pulled By: goldsborough

fbshipit-source-id: 5e6b788921a27a44359db89afdc2b057facc5cec
2018-07-19 15:56:54 -07:00
bcf0bf42a1 Extend DispatchStub to support CUDA dispatch (#9579)
Summary:
This is a few files taken from https://github.com/pytorch/pytorch/pull/8919. They're unchanged from the latest versions of that PR.

```
This is part of https://github.com/pytorch/pytorch/pull/8919. It's
separated to make it easier to merge the PR in pieces.

There are a few major changes to DispatchStub

 - The environment variable ATEN_CPU_CAPABILITY overrides the CPU
   capability detection code (Previous ATEN_DISABLE_AVX/AVX2)

 - DispatchStub is defined in the generic native code instead of the
   CPU_CAPABILITY_DEFAULT kernel.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9579

Differential Revision: D8909000

Pulled By: colesbury

fbshipit-source-id: fdeb606270b06acdab3c01dba97ec9d81584ecc0
2018-07-19 14:25:40 -07:00
a08119afc2 Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561)
Summary:
* THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>`
* Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet.
* There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides)
* Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides
* Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides)

Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go.

Note for gchanan: review from commit "ci" and after
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561

Reviewed By: cpuhrsch

Differential Revision: D8901926

Pulled By: ezyang

fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510
2018-07-19 14:10:06 -07:00
f521823b7b Do not always set broadcast argument when exporting new onnx add and sub to caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9597

Reviewed By: colesbury

Differential Revision: D8920575

Pulled By: bddppq

fbshipit-source-id: 97423e1bf6a20559d466d2ac56c9e74e10bfc129
2018-07-19 14:10:05 -07:00
6557856671 Fix l2 normalization when handling zero vector (#9594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594

When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this.

Reviewed By: pjh5

Differential Revision: D8849732

fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706
2018-07-19 14:10:03 -07:00
85b2816358 quick patch for PackPadded removal to propagate the correct size. (#9593)
Summary:
The underlying reason why this is even an issue is that the conversion
into and out of the 'fictional' onnx operators is done in an unhygenic
order. This doesn't address that, but it does fix the one observable
case where this produces an incorrect result, and unblocks some other
work being done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9593

Differential Revision: D8919125

Pulled By: anderspapitto

fbshipit-source-id: a88ca979c3b9d439863e223717d3697180c26121
2018-07-19 14:10:02 -07:00
f33cd36c9b Use int64_t for im2col and col2im (#9590)
Summary:
Fixes #9404
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9590

Differential Revision: D8916020

Pulled By: SsnL

fbshipit-source-id: ac6758326bbb09b48642b149f4eb8f466ef7044e
2018-07-19 11:29:24 -07:00
f180373d68 Support n-dimensional empty tensors in CUDA BLAS and fix a btrifact bug. (#9573)
Summary:
This is mainly straightforward, with two exceptions:
1) cublasSgemv, cublasDgemv appear to have a bug where (x,0).mv(0) does not handle beta, whereas cublasSgemm, cublasDgemm do for case where (x,0).mm(0,y).  This is handled by manually calling zero / mul.

2) I fixed a bug in btrifact that was broken even when dealing with non-empty tensors.  Basically, if out.stride(0) was 1, because the underlying BLAS call expects column-major matrices, to get a column-major tensor, out.transpose_(0, 1) would be called.  But this is just wrong, as if the batch dimension (0) doesn't match the size of the columns (1), you don't even have a tensor of the correct shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9573

Reviewed By: ezyang

Differential Revision: D8906144

Pulled By: gchanan

fbshipit-source-id: de44d239a58afdd74d874db02f2022850dea9a56
2018-07-19 09:50:27 -07:00
aee9e90abd Fix TestAutograd.test_as_strided (#9538)
Summary:
0. Fixes #9479
1. rewrites `as_strided` as a native function. This is fine because `set_` does the scalar check.
2. allow using `self` in `python_default_init`. Previously `python_variable_methods.cpp` has `self` as an input `PyObject *`, and use `self_` as the unpacked tensor. But `python_torch_functions.cpp` just use `self` as the unpacked tensor, making it impossible to use `self` in `python_default_init`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9538

Differential Revision: D8894556

Pulled By: SsnL

fbshipit-source-id: ca7877b488e12557b7fb94e781346dcb55d3b299
2018-07-19 09:11:13 -07:00
e0446fcfa9 Pass dtype to tensor contructor in test_neg (#9558)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/9554.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9558

Differential Revision: D8901085

Pulled By: yf225

fbshipit-source-id: 0edb176fcb18e0c0bcfc6f209343b9097767c9b8
2018-07-19 08:54:39 -07:00
54db14e390 HIP Operators Generator--> HipOpG (#9322)
Summary:
The goal of this PR is to add an infrastructure; to convert(hipify) CUDA ops into [HIP](https://github.com/ROCm-Developer-Tools/HIP) ops , at **compile** time.

Note that HIP ops, which are portable c++ code, can run on AMD and NVIDIA platform.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9322

Differential Revision: D8884707

Pulled By: bddppq

fbshipit-source-id: dabc6319546002c308c10528238e6684f7aef0f8
2018-07-19 00:26:06 -07:00
45f0d05202 Adapt OnnxifiOp to removed suffix handling in ONNXIFI loader (#9571)
Summary:
Adapt to changes in onnx/onnx#1203
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9571

Reviewed By: yinghai

Differential Revision: D8907892

Pulled By: bddppq

fbshipit-source-id: 9f88471639dbe9050194e84340f335bece834d5d
2018-07-18 19:26:23 -07:00
604f7e98c3 Expose CAFFE2_USE_OPENCV preprocessor flag (#9509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9509

generate_proposals_op_util_nms.h conditionally requires OpenCV in some cases,
and earlier this was checking just CV_MAJOR_VERSION macro, but that is
undefined unless opencv.hpp is included. Adding `-DCAFFE2_USE_OPENCV` to
TARGETS when opencv is included in external_deps to check for this correctly.
Thanks jinghuang for flagging this issue!

Differential Revision: D8880401

fbshipit-source-id: 65abbcf4ffe3feffc0ee2560882cb8eb0b7476f9
2018-07-18 18:56:49 -07:00
b3e141e84c Add predictor config into Predictor (#9434)
Summary:
This is the first step of refactoring the Predictor. In this diff the config struct
is introduced and the internal data structure of Predictor has been updated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9434

Differential Revision: D8843262

Pulled By: fishbone

fbshipit-source-id: 23f5e4751614e3fedc9a04060d69331bfdecf864
2018-07-18 16:39:56 -07:00
04b33b7231 Add byte_weight_dequant_op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9541

Reviewed By: hlu1

Differential Revision: D8882964

fbshipit-source-id: 06d2e0d227ea6a4a8dc5ef1ea9dd1d449c149b47
2018-07-18 16:27:21 -07:00
c1ee8835b6 Constructors and member functions for THStorage (#9357)
Summary:
Added on top of ezyang's https://github.com/pytorch/pytorch/pull/9278
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9357

Reviewed By: ezyang

Differential Revision: D8863934

Pulled By: cpuhrsch

fbshipit-source-id: a45c955c0b1e9e0866749b3a7e8a36de931bdff1
2018-07-18 15:56:26 -07:00
4c615b1796 Introduce libtorch to setup.py build (#8792)
Summary:
Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other.

1) with setup.py. This method
- used the setuptools C extension functionality
- worked on all platforms
- did not build test_jit/test_api binaries
- did not include the C++ api
- always included python functionality
- produced _C.so

2) with cpp_build. This method
- used CMake
- did not support Windows or ROCM
- was capable of building the test binaries
- included the C++ api
- did not build the python functionality
- produced libtorch.so

This diff combines the two.

1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build
- is CMake-based
- works on all platforms
- builds the test binaries
- includes the C++ api
- does not include the python functionality
- produces libtorch.so

2) the setup.py build
- compiles the python functionality
- calls into the CMake build to build libtorch.so
- produces _C.so, which has a dependency on libtorch.so

In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792

Reviewed By: ezyang

Differential Revision: D8764181

Pulled By: anderspapitto

fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f
2018-07-18 14:59:33 -07:00
3b886500a0 Add CUDAGuard to ATen (#9277)
Summary:
THCStream was recently moved to ATen by mruberry: https://github.com/pytorch/pytorch/pull/8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface.

I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor.

colesbury apaszke ezyang

Fixes https://github.com/pytorch/pytorch/issues/7800
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9277

Differential Revision: D8865183

Pulled By: goldsborough

fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7
2018-07-18 14:40:31 -07:00
8769fec03f Move clamp into ATen (#9506)
Summary:
Glue component of https://github.com/pytorch/pytorch/pull/9319

Important to unblock wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9506

Reviewed By: wanchaol

Differential Revision: D8879437

Pulled By: cpuhrsch

fbshipit-source-id: 16ea8a93f3f5df2695180b3a30a583834b7004f1
2018-07-18 13:40:11 -07:00
c506ff97c8 Disable py2-clang3.8-rocmnightly-ubuntu16.04-test in disabled-configs… (#9543)
Summary:
….txt setting

In the ROCm branches we will experiment with turning this on.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9543

Differential Revision: D8897990

Pulled By: ezyang

fbshipit-source-id: ae9d25d1b79ee421d49436593edf8c7e49b3a4e5
2018-07-18 12:58:56 -07:00
ca3b36aa6a Add implementation for batch_moments_op (#9510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510

Add implementation for batch_moments_op

Reviewed By: houseroad

Differential Revision: D8587654

fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b
2018-07-18 11:59:54 -07:00
8c741b7c4f Add transformation from caffe2::resizeop to onnx::upsample
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9511

Reviewed By: hlu1

Differential Revision: D8876692

fbshipit-source-id: 9ba346e225cfbc686d370134fe41a28333b933cc
2018-07-18 11:59:52 -07:00
b6b6e1b39f Fix core.Plan.create_from_proto (#9438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9438

Current implementation of create_from_proto doesn't work as expected: it
duplicates networks and execution steps by copying original PlanDef first and
adding each step one-by-one later.

Reviewed By: pjh5

Differential Revision: D8850316

fbshipit-source-id: 9b02836d6e6ee1c91cfdd3b4c4804f14137dc22b
2018-07-18 10:55:55 -07:00
27455e9c78 Use _six for inf and nan (#9500)
Summary:
Things like `float('inf')` are actually quite expensive.
```py
In [1]: import math

In [2]: %timeit -n 200 math.inf
49.3 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 200 loops each)

In [3]: %timeit -n 200 float('inf')
194 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 200 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9500

Reviewed By: soumith

Differential Revision: D8876229

Pulled By: SsnL

fbshipit-source-id: 78602b76bb53d5588910b58270930c0bd413d2d7
2018-07-18 10:40:29 -07:00
35f7925aad fix small literals being flushed to 0 by std::to_string
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9478

Differential Revision: D8872083

Pulled By: soumith

fbshipit-source-id: 90083b6047f59466949ace249193094131a30cd5
2018-07-18 09:25:06 -07:00
d6e124e9a5 Dummy CircleCI config. (#9537)
Summary:
The purpose of this config is to make sure that CircleCI builds
don't fail when I turn them on for pytorch/pytorch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9537

Differential Revision: D8894497

Pulled By: ezyang

fbshipit-source-id: 22f43c84a9b8a54cd47a6572ba068f70a73f043a
2018-07-18 09:25:05 -07:00
28954b9e68 Fix RoIAlignOp GPU implementation for RoIs without batch index (#9230)
Summary:
Fix RoIAlignOp GPU implementation for RoIs without batch index
According to https://caffe2.ai/docs/operators-catalogue.html#roialign, RoIs is "2D input of shape (R, 4 or 5)"
Pass RoIs 2nd dimension as kernel parameter and adjust kernel accordingly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9230

Reviewed By: houseroad

Differential Revision: D8886798

Pulled By: malfet

fbshipit-source-id: 52a8b4df85f7e350e36c842ee4428f3a1cba2588
2018-07-18 08:39:50 -07:00
8fe2622090 Fix gatherTopK template (#9231)
Summary:
Fix gatherTopK template
This change makes it possible to instantiate getherTopK() with IndecesType other than caffe2::TIndex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9231

Reviewed By: houseroad

Differential Revision: D8886778

Pulled By: malfet

fbshipit-source-id: d5fb1f8814710cd81bc0cf65e0f96fd9fd8317da
2018-07-18 08:25:23 -07:00
f277645968 Support N-dimensional empty tensors in CPU BLAS and (a selection of) … (#9522)
Summary:
…CPU LAPACK routines.

Note that the LAPACK functions in general require a different approach, because direct calls with size zero dims do not work.
Here I just selected a reasonable subset of LAPACK routines to support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9522

Reviewed By: ezyang

Differential Revision: D8888180

Pulled By: gchanan

fbshipit-source-id: 16b9013937806d375d83d1c406815765fda00602
2018-07-18 08:25:21 -07:00
5eaed750c2 Implementing torch.isfinite (#9487)
Summary:
fixes #9132
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9487

Reviewed By: soumith

Differential Revision: D8875529

Pulled By: SsnL

fbshipit-source-id: d1b8aa825d202cfbdca27897da6a8bc1b714f856
2018-07-18 08:25:20 -07:00
57608214d4 Make squeeze doc consistent with it's behaviour (#9529)
Summary:
A 0-dimensional tensor is now returned when squeezing a tensor with a single element.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9529

Differential Revision: D8893103

Pulled By: soumith

fbshipit-source-id: 658189ecfff283b2b7281feb16a397692d6dbd8f
2018-07-18 08:25:18 -07:00
3eb3f03776 ROCm contributions week 28 (#9432)
Summary:
This PR contains the ROCm contributions of last week:
* documentation of pyHIPIFY data format originating from #8812 reviewing comments by ezyang
* removal of most patch files from the `amd_build` directory and integration into the code base
* enabling of previously disabled_features that do compile now
* improvement to the static_cast feature in pyHIPIFY (it will only apply static_cast to kernel arguments, not launch arguments)
* addition of two workarounds to pyHIPIFY for ROCm/HIP shortcomings: a) `__forceinline__` does not imply `static`, hence change to `__inline__`, b) `std::[exp,log,pow]` math functions cannot be selected in device code, use `::[exp,log,pow]` instead. Both of these workarounds will be removed once the issues are fixed upstream. Neither of these issues have surfaced on the CI but were reproduced internally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9432

Differential Revision: D8887441

Pulled By: ezyang

fbshipit-source-id: 71cf5c6b13772a66d10be369a45ebf06e4e268e1
2018-07-18 07:54:58 -07:00
73225e4a1d add docs for using python setup.py clean in developing mode (#9524)
Summary:
This command (suggested by albanD when I raised a related question in pytorch slack) is super useful to me. I have used it several times and it worked like a charm (without it, I have to delete entire pytorch folder and clone things again). So I guess it is nice to have in the CONTRIBUTING doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9524

Differential Revision: D8890126

Pulled By: soumith

fbshipit-source-id: c1798ff1ab2423627fcd8e0662a66c4e85cb2413
2018-07-18 05:23:41 -07:00
89db578e66 Fixed a typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9523

Differential Revision: D8890124

Pulled By: soumith

fbshipit-source-id: dea8d153fc352c36b219298c52f2c97caf9999f4
2018-07-18 05:09:22 -07:00
6de038286a Add random data filler to predictor bench to support production nets (#9520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9520

Add random data filler to predictor bench to support production nets

Reviewed By: salexspb

Differential Revision: D8712757

fbshipit-source-id: 2c732b2ba71ab210f9222adf94d08442ca71dc03
2018-07-18 00:46:02 -07:00
543d4af79f Be strict prototypes clean. (#9516)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9516

Differential Revision: D8886493

Pulled By: ezyang

fbshipit-source-id: fea974fd96c7d81126a129eb5b8b06eb1b028526
2018-07-17 20:25:53 -07:00
aa73348d75 added reminder of args naming rules to readme (#9504)
Summary:
- I ran into this couple days ago, and thought it might be useful to take note on that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9504

Reviewed By: soumith

Differential Revision: D8887396

Pulled By: weiyangfb

fbshipit-source-id: d2061cf379ce140d6e43ef6c18241f7ce00dbab6
2018-07-17 19:40:38 -07:00
004d924807 Give THTensor a constructor, use new/free. (#9496)
Summary:
Stacked on #9495
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9496

Differential Revision: D8875528

Pulled By: ezyang

fbshipit-source-id: 6419d2ffb07aaf49c1462e7b64737019abbb7f61
2018-07-17 19:25:37 -07:00
c33d2c0b04 Thread-safe dispatcher table (#9126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9126

Closes https://github.com/pytorch/pytorch/pull/9126

Allow concurrent read and writes in dispatcher table

Reviewed By: smessmer

Differential Revision: D8722560

fbshipit-source-id: e376bcd59f1b9f6b0e6fd3dd376a55561ea3c9c3
2018-07-17 17:41:53 -07:00
13e0c9295d Add Support for count_include_pad in AveragePool in Caffe2 ONNX Backend (#9458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9458

The goal is to support count_include_pad in Caffe2 ONNX backend. This commit contains the first step - support 4-D tensor cases.
AveragePool with count_include_pad can be expressed as PadImage + AveragePool.

Reviewed By: houseroad

Differential Revision: D8852180

fbshipit-source-id: 4db00e9771be7a000a2d92850dfd066d9c9c38bf
2018-07-17 17:41:52 -07:00
1c3580b6fe Added hash for device (#9246)
Summary:
If this is good, I could write some tests to ensure collision doesn't occur within a given range.

Closes #7228
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9246

Differential Revision: D8872608

Pulled By: ezyang

fbshipit-source-id: 0ed29a73188f4167b42756f59a5c9a3d5cb37326
2018-07-17 17:10:17 -07:00
5c695e3a60 Implement 2D and 3D alpha_dropout (#9073)
Summary:
It implements per-channel alpha_dropout. It also creates corresponding function classes and unifies the process of dropout and alpha_dropout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9073

Differential Revision: D8727008

Pulled By: ezyang

fbshipit-source-id: 9d509f9c5db4e98f7b698cdfc4443505a4d2b331
2018-07-17 17:10:16 -07:00
6116954e97 oss heatmap_max_keypoint_op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9470

Reviewed By: pjh5

Differential Revision: D8826713

fbshipit-source-id: 47674af86b3a5ae0752056faf3b93f0d96e38fc2
2018-07-17 16:55:47 -07:00
0fe980c748 Memory usage measurement -- Caffe2 (#9017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9017

Closes https://github.com/pytorch/pytorch/pull/9017

Added "get_blob_size_bytes" to "pybind_state.cc" in Caffe2 to expose the size of blob in bytes.

Reviewed By: kuttas

Differential Revision: D8685696

fbshipit-source-id: 9a9d38f207c8c59ef534217181e8ce1514617628
2018-07-17 16:40:23 -07:00
9b0c53ac22 Deduplicate THTensor and THCTensor. (#9495)
Summary:
This is enabled by the allocator patch; previously we could not
deduplicate THStorage_free/THCStorage_free; now we can.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9495

Reviewed By: SsnL

Differential Revision: D8875497

Pulled By: ezyang

fbshipit-source-id: 387198dff446eb9f84d2d6187066fae1d595dea7
2018-07-17 15:41:15 -07:00
2249751422 Add OptimizerBase::add_parameters (#9472)
Summary:
ebetica asked for a way to add parameters to `Optimizer`s after they are created.

ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9472

Differential Revision: D8872176

Pulled By: goldsborough

fbshipit-source-id: 39a4032c519a6d3b458dd3596361b04afea10365
2018-07-17 14:10:22 -07:00
890037eaaf Fix (non-reduction) ops over a dimension for n-dimensional empty tens… (#9482)
Summary:
…ors (CPU).

This includes (mainly) CPU fixes; CUDA fixes are a little more involved because you can't use an empty grid.
This also includes a fix for index_copy, which checked that self.size(dim) == src.size(0), which isn't correct (the same dimension should be compared).
Finally, also includes a fix for CUDA flip (although it's not tested yet), to get the stride using multiplication rather than division to avoid divide-by-0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9482

Reviewed By: ezyang

Differential Revision: D8873047

Pulled By: gchanan

fbshipit-source-id: 86523afd3d50277834f654cd559dfbc7875cdffe
2018-07-17 13:11:04 -07:00
8be4657871 Add ideep copy for TensorCPU<long> in IDEEPFallbackOp (#9480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9480

Ops like Reshape sometimes take a second input tensor of long with the new
shape (can also be specified in arg). If this input tensor is passed in via
external input (which ONNX does sometimes), LoadOp fails with an exception.

Such ops anyway are executed by IDEEPFallbackOp, so this should be fine.

Reviewed By: yinghai

Differential Revision: D8872671

fbshipit-source-id: 659a02416c374e373ce041a7d65a174be828702d
2018-07-17 11:55:23 -07:00
30f849cdc5 Correct model name in caffe2 onnx backend tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9485

Reviewed By: houseroad

Differential Revision: D8873733

Pulled By: bddppq

fbshipit-source-id: 3a3cc351834cbbedce360760504ea16f5fa0ea06
2018-07-17 11:41:01 -07:00
d2d43824cd Delete flag from THTensor. (#9494)
Summary:
It was only used to toggle refcounting, but we ALWAYS
refcount tensors.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9494

Differential Revision: D8875169

Pulled By: ezyang

fbshipit-source-id: 3a8618fb288334e62942bbaf388f3c9e473e7524
2018-07-17 11:25:41 -07:00
e5678794ed Reenable multiprocessing preserve sharing tests on ASAN. (#9498)
Summary:
This issue was fixed in 976f9253a5425918eda7cf865b097cf42b5da8d7

Fixes #5311.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9498

Differential Revision: D8875605

Pulled By: ezyang

fbshipit-source-id: 449ffe975d35c959f92874437ba9be37d4d3a1f2
2018-07-17 11:10:21 -07:00
050a2588b5 change stft to have consistent signature with librosa (#9497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9497

Fixes #7883 by using `rfft`.

It's worth noting that this is BC breaking. And it's impossible to detect the change because the two signatures before and after this change supports a common subset of calling patterns, e.g., `stft(Tensor, int, int)`. (some other calling patterns will raise error).

soumith and I plan to change the current `stft` interface because it is a bit messy and non-standard. rafaelvalle suggested us that `librosa` is a good reference API to align with. After discussing with soumith and ezyang , and given that `stft` is only out for 1 release, I decide to go with directly changing the signature. Also, my understanding is that most researchers in this field will welcome this change as `librosa` seems to be the golden-standard here. (it doesn't yet support all `pad_mode` but those will become available if added to `F.pad`.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9308

Reviewed By: ezyang

Differential Revision: D8806148

Pulled By: SsnL

fbshipit-source-id: f6e8777d0c34d4a4d7024e638dc9c63242e8bb58
2018-07-17 10:55:43 -07:00
7d2a17876f test_cuda: ensure tests use float and adjust HalfTensor tolerances (#9475)
Summary:
test_cuda.py uses routine 'number' to prepare many testscases.
number should return a floating point value for float-type tensor
types, or integer otherwise. But number's test to classify the type
is incorrect, so it always returns the integer value.
(type(t).__name__ is always 'torch.tensortype' so never matches
'Double', 'Float', or 'Half'.)

Update number to use the existing is_floating() helper to make the
check.

The change to number causes a few tests to fail for HalfTensor. Relax
the tolerance for those in line with other HalfTensor testcases. The
failing tests--for addcdiv and fill--were not previously relaxed for
HalfTensor so are held to the over-strict 1e-5 default tolerance.

Finally, update a couple other tests for HalfTensor type to use the
existing is_half() helper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9475

Reviewed By: yf225

Differential Revision: D8872112

Pulled By: ezyang

fbshipit-source-id: 016e3e15adb23f6606bd4c08218954c1396699db
2018-07-17 10:25:17 -07:00
52cc073212 Implement reshape_as (#9452)
Summary:
1. Added tests
2. Added doc string
3. Remove view_as redundant definition from tensor.py

Closes #9416
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9452

Differential Revision: D8851794

Pulled By: ezyang

fbshipit-source-id: 0aa0430dd0a174e1a5caddbc50a7e2c9eb7802bc
2018-07-17 08:54:42 -07:00
11fc16dc98 Remove HTML tags from README.md (#9296)
Summary:
This change makes README.md compatible with both Github and VSTS markdown engines. Images can be reduced if necessary
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9296

Differential Revision: D8874931

Pulled By: soumith

fbshipit-source-id: 0c530c1e00b06fc891301644c92c33007060bf27
2018-07-17 07:24:43 -07:00
4ff636a3fd Update onnx to onnx/onnx@b2817a6 (#9476)
Summary:
b2817a682f
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9476

Reviewed By: houseroad

Differential Revision: D8868253

Pulled By: bddppq

fbshipit-source-id: b1f14bab47f020f0bc0239da7e2bbf959a407d6a
2018-07-16 22:17:09 -07:00
ae44a6b5e3 Fix Sequential::clone() (#9372)
Summary:
I noticed that `Sequential::clone()` does not work. This is because `Sequential` does not use `reset()` which is normally where modules have to initialize and register its submodules. Further, this is because of the way `Sequential` allows its modules to be passed in the constructor, which doesn't work with `reset()` (since it does "late" initialization).

I've added some better error messages inside `Cloneable::clone()` which makes this kind of mistake clearer for other users, and tests for `Sequential::clone()`.

I also had to give `AnyModule` a deep `clone()` method.

ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9372

Differential Revision: D8865189

Pulled By: goldsborough

fbshipit-source-id: b81586e0d3157cd3c4265b19ac8dd87c5d8dcf94
2018-07-16 21:53:42 -07:00
e8b8c3895e Enable Conv fusion optimizations in optimizeForIdeep (#9255)
Summary:
Enable fusion for IDEEP in optimizeForIdeep
including Conv+ReLU, Conv+Sum, Conv+Sum+ReLU, Conv+BN
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9255

Reviewed By: bddppq

Differential Revision: D8809030

Pulled By: yinghai

fbshipit-source-id: af30bad3b96cb965bd26a4dfa810370faec4bb88
2018-07-16 21:28:50 -07:00
9235ff53f1 Clip horizontal bounding boxes during rotated detection for backward compatibility (#9403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9403

In BBoxTransform and GenerateProposal ops, clip_boxes makes sure the bbox fits
within the images. For rotated boxes, this doesn't always make sense as there
could be multiple ways to clip a rotated box within an image boundary.
Moreover, clipping to a horizontal box means we leave out pixels of interest
potentially. Therefore, we clip only boxes with angle almost equal to 0 (with a
specified `angle_thresh` tolerance).

Reviewed By: pjh5

Differential Revision: D8828588

fbshipit-source-id: 39c1eafdb5d39d383780faa0a47e76149145e50c
2018-07-16 20:24:49 -07:00
ad74006ffa Pass THDRequest as void* pointer to THDRequest_free (#9398)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/9054.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9398

Reviewed By: ezyang

Differential Revision: D8827778

Pulled By: yf225

fbshipit-source-id: 862287802cb69c6ac71ff4df19cadb89b1face1d
2018-07-16 19:25:22 -07:00
c4bff25282 Additional operator information values (#9153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9153

Closes https://github.com/pytorch/pytorch/pull/9153

Modified the values reported by the benchmarking platform to include tensor_shape and op_args. These values have a different naming scheme to values like flops and latency.

Reviewed By: sf-wind

Differential Revision: D8729791

fbshipit-source-id: f050200be01c6d0794bf5faaa6e8cef12a00affe
2018-07-16 17:40:44 -07:00
7df48d0444 Merge .cu and _gpu.cc files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9473

Reviewed By: houseroad

Differential Revision: D8865754

Pulled By: bddppq

fbshipit-source-id: 406eda6c145f03a0ee35c4643ec1ec0092fbce88
2018-07-16 17:10:18 -07:00
45140368c3 Update onnx-tensort module to the latest (#9469)
Summary:
Update onnx-tensort to follow up recent changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9469

Reviewed By: Maratyszcza

Differential Revision: D8866704

Pulled By: yinghai

fbshipit-source-id: 3b96ec2fa28470f0d4b5a7c62ab332eeba4bdb12
2018-07-16 17:10:16 -07:00
5ff686651f move batchop import to init to avoid debugging confusions (#9425)
Summary:
fixes #9409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9425

Reviewed By: ezyang

Differential Revision: D8842844

Pulled By: wanchaol

fbshipit-source-id: 3c6b26470d59d8d1fc5f79caa70252b9de7290e4
2018-07-16 15:40:28 -07:00
80160f6186 Skip PyTorch ROCm tests in the script. (#9467)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9467

Reviewed By: houseroad

Differential Revision: D8860794

Pulled By: ezyang

fbshipit-source-id: 9b11475d9bb4b3361973865d7f68e562bffbf9d8
2018-07-16 15:40:26 -07:00
976f9253a5 Eliminate storage views. (#9466)
Summary:
Storage views were previously used to implement CUDA IPC sharing,
but they weren't necessary.  The new strategy is described in
Note [CUDA IPC and the caching allocator].

This also fixes an unrelated bug, where we weren't actually using
the Tensor forking pickler, because we didn't register a pickler
for torch.Tensor.

Fixes #9447.  Fixes #46.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466

Reviewed By: apaszke

Differential Revision: D8859698

Pulled By: ezyang

fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
2018-07-16 15:40:24 -07:00
9ed2190bdb Add a tagged union type that replaces tensor in the interpreter. (#9368)
Summary:
IValue is short for interpreter value. It is used frequently so a short name is important.
This will allow us to implement more non-tensor types in an efficient way and remove
many hacks from the compiler.

This PR is limited. It only introduces IValue and changes interpreter to use it.
Follow up PRs will:
* Change the way aten_ops consume non-tensor types so that integer lists,
  are no longer represented as Tensors.
* Introduce TensorList as a fundamental type and remove all vararg handling in gen_jit_dispatch
* Change the compiler to implement math on primitive numbers rather than converting to tensors.

jamesr66a  apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9368

Reviewed By: ezyang

Differential Revision: D8817598

Pulled By: zdevito

fbshipit-source-id: 29dce80611ce5f6384234de9d12a67861d2b112f
2018-07-16 15:40:22 -07:00
9ae77cc1f5 Implement tensor weak references (#9363)
Summary:
Add `WeakTensor` - a `Tensor` counterpart which doesn't keep the data (or any other expensive resources) alive. They can be `.lock()`ed and return `at::optional<Tensor>` if they're still alive.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9363

Reviewed By: ezyang

Differential Revision: D8815434

Pulled By: apaszke

fbshipit-source-id: 1b3e96503c1285d78ef124c585e65c7630f3253e
2018-07-16 13:10:29 -07:00
9413fabb0b Nuke TestCollectEnv (#9459)
Summary:
The tests were too flaky, and the procedure for legitimately
updating versions of software too onerous, to warrant continually
testing these.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9459

Reviewed By: zou3519

Differential Revision: D8852357

Pulled By: ezyang

fbshipit-source-id: 24e99cd00b4252cdeec2a1d9af92456b4a54912a
2018-07-16 13:10:28 -07:00
b0c5c86492 Add test case for segmentation fault fix in grad_fn (#9457)
Reviewed By: apaszke

Differential Revision: D8863572

Pulled By: ezyang

fbshipit-source-id: 13749f51320a4e403644674b0335aed4987fa887
2018-07-16 13:10:26 -07:00
66fe3b5c06 Add peephole optimization for type_as operators. (#9316)
Summary:
If the type_as operator takes in two values with the same type, remove that operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9316

Reviewed By: zdevito

Differential Revision: D8808355

fbshipit-source-id: 2d5710a6380b22f4568fc38a439061b5340c4eb1
2018-07-16 10:26:56 -07:00
52abcdd0dc Fix out-of-range error for test_neg (#9431)
Summary:
`test_neg` sometimes fails internally because `random_()` can generate an out-of-range value for CharTensor. This PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9431

Reviewed By: SsnL

Differential Revision: D8843284

Pulled By: yf225

fbshipit-source-id: bf516cceb8f780e133fa54f7364c77821eb7c013
2018-07-16 10:26:54 -07:00
e7f49d1444 add depthwise conv support for mkldnn (#8782)
Summary:
Change-Id: I3836dacc63afc1b5e31b1d706bba6bb13699ba41

beneficial for depth wise convolution on CPU, such as mobilenet, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8782

Reviewed By: SsnL

Differential Revision: D8790869

Pulled By: ezyang

fbshipit-source-id: 29f410763ce403c2438fc527aa354ff02e1829bf
2018-07-15 17:40:55 -07:00
8766daeec9 Refactor _log_sum_exp (#9173)
Summary:
This PR removes `distributions.utils._log_sum_exp` in favor of `torch.logsumexp`. Also fixes some warnings with `reduce` arg. in `binary_cross_entropy_with_logits`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9173

Reviewed By: SsnL

Differential Revision: D8764174

Pulled By: ezyang

fbshipit-source-id: b9c4136dbf0182e8ae77082e6448d23a430d5cb6
2018-07-15 17:40:53 -07:00
97008a64a1 Add ModuleDict and ParameterDict containers (#8463)
Summary:
Addresses:

https://github.com/pytorch/pytorch/issues/4048 and https://github.com/pytorch/pytorch/pull/5297#issuecomment-394924139
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8463

Reviewed By: SsnL

Differential Revision: D8689291

Pulled By: ezyang

fbshipit-source-id: 47e67d9bae1b64ec10771a2c00c56229463b1598
2018-07-15 17:40:52 -07:00
cffca2926b Introduce SupervisedPtr, delete THAllocator and THCDeviceAllocator (#9358)
Summary:
See Note [Supervisor deleter] for how SupervisedPtr works.
This design is not the obvious one, but there were a lot of
constraints feeding into it:

- It must support the reallocation usage-pattern, where, given
  an existing Storage, we allocate a new region of memory,
  copy the existing data to it, and then deallocate the old
  region of memory.

- Creation of a deleter for memory MUST avoid dynamic allocations
  in the common case.  We've done some benchmarking in Caffe2
  where dynamic allocation for deleters is ruinously expensive,
  and it's really hard to avoid these performance tarpits in
  very general function wrappers like std::function or
  folly::Function (while benchmarking this, we discovered that
  folly::Function's move constructor was way more expensive
  than it should be).

- We need to be able to deallocate data that comes from external
  sources, e.g., dlpack and numpy tensors.  Most notably,
  you often cannot deallocate these with merely the void*
  data pointer; you need some extra, out-of-band information
  (e.g., the managing struct) to deallocate it.  Sometimes,
  you may even want to resize data living in an external source!

- The "core" allocators need to support being wrapped in a Thrust
  allocator, so you need to be implement the following two functions:

    char* allocate(size_t);
    void deallocate(char*, size_t);

- We need to support tensors which contain non-POD, non-trivially
  copyable data; specifically tensors of std::string.  This is
  an upcoming requirement from Caffe2.  It's dirty AF, but
  it's really useful.

- It should use C++ standard library types like std::unique_ptr
  (which is hugely problematic because std::unique_ptr doesn't
  call the deleter when the pointer is null.)

Here is the billing of changes:

- Built-in support for realloc() has been DROPPED ENTIRELY.
  Instead, you're expected to allocate and then copy from
  the old memory to the new memory if you want to do a
  reallocation.  This is what you'd generally have expected
  to occur; and axing realloc() from the design lets us avoid
  some tricky correctness issues with std::realloc(), namely
  the fact that we must refuse the realloc if the type of the
  elements are not trivially copyeable.  If it really matters,
  we can add this back, but there really needs to be a good
  explanation WHY you need fast resizing reallocations (by in
  large, people don't resize their storages, and it should
  be acceptable to have a performance degradation when they
  do).

- TH_STORAGE_FREEMEM is no more; instead, if you want a
  storage which doesn't free its result, you just give it
  an empty deleter.

- What we used to call an "allocator" (really, a combined
  object for allocating/deleting) has been split into two
  concepts, an allocator, and a smart pointer (SupervisedPtr)
  which knows how to delete data.

    - Unlike previously, where THAllocator/THCDeviceAllocator
      could have a per-tensor context storing extra information
      (e.g., a pointer to the metadata you need to actually
      free the tensor), there is no context in the allocator or
      the deleter of the smart pointer; instead, the smart
      pointer directly holds an owning reference to the
      metadata necessary to free the data.  This metadata
      is *freshly manufactured* upon every allocation, which
      permits us to resize tensors even in the absence of
      built-in support for realloc().

    - By default, allocators don't support "raw" allocations
      and deallocations with raw pointers.  This is because
      some allocations may return a different context every
      time, in which case you need to reconstruct the context
      at delete time (because all you got was a void*, not
      a unique_ptr that carries the deleter).

- The diff between at::Allocator and THCDeviceAllocator is a
  bit larger:

    - It used to return a cudaError_t.  Now, allocators
      are expected to check the error status immediately and throw
      an exception if there was an error.  It turns out that this
      is what was immediately done after all occurrences of
      allocate/release, so it wasn't a big deal (although some
      subsidiary interfaces had to themselves be converted to
      not return cudaError_t).

      There is one notable exception to this, and it is how
      we handle CUDA OOM: if this occurs, we attempt to return
      unused memory to the system and try again.  This is now
      handled by a catch-all try-catch block.  The cost of
      catching the exception is probably the least of your worries
      if you're about to OOM.

    - It used to take the CUDA stream to perform the allocation
      on as an argument.  However, it turned out that all call
      sites, this stream was the stream for the current device.
      So we can push this into the allocator (and the choice,
      in the future, could be made explicitly by twiddling
      thread local state.)

    - It held two extra methods, emptyCache and cacheInfo, specifically
      for interacting with some state in THCCachingAllocator.
      But this "generality" was a lie, since THCCachingAllocator
      was the only allocator that actually implemented these
      methods, and there is actually a bunch of code in THC
      which assumes that it is the caching allocator that is
      the underlying allocator for CUDA allocations.  So I
      folded these two methods into this interface as
      THCCachingAllocator_emptyCache and THCCachingAllocator_cacheInfo.

    - It held its context directly inside the THCDeviceAllocator
      struct.  This context has been moved out into whatever
      is holding the at::Allocator*.

- The APIs for getting at allocators/deleters is now a little different.

    - Previously there were a bunch of static variables you could get
      the address of (e.g., &THDefaultAllocator); now there is a
      function getTHDefaultAllocator().

    - Some "allocators" didn't actually know how to allocate (e.g.,
      the IPC "allocator").  These have been deleted; instead, you
      can wrap the produced pointers into SupervisedPtr using
      an appropriate makeSupervisedPtr() static method.

- Storage sharing was a lot of work to wrangle, but I think I've
  tamed the beast.

    - THMapAllocator and its "subclasses" have been refactored to
      be proper, honest to goodness C++ classes.  I used the enum
      argument trick to get "named" constructors.  We use inheritance
      to add refcounting and management (in libshm).  What we previously
      called the "Context" class (Context has been dropped from the name)
      is now the supervisor for the data.

    - Sometimes, we need to pull out the file descriptor from a
      tensor.  Previously, it was pulled out of the allocator context.
      Now, we pull it out of the supervisor of the SupervisorPtr,
      using the static method fromSupervisedPtr(), which uses the
      deleter as the typeid, and refines the type if it matches.

- I renamed the std::function deleter into
  InefficientStdFunctionSupervisor, to emphasize the fact that it does
  a dynamic allocation to save the std::function deleter.

TODO:

- Windows libshm is in shambles and needs to be fixed.

Perhaps for the future:

- newFromFd is now unconditionally calling cudaPointerGetAttributes
  even though this is unnecessary, because we know what the device
  is from higher up in the callstack.  We can fix this by making
  newWithDataAndAllocator also take an explicit device argument.

- Consider statically distinguishing between allocators that
  support raw_allocate/raw_deallocate, and those which don't.
  The Thrust constraint applies only to the CUDA device allocator;
  you never need to allocate CPU memory this way

- Really want to get rid of storage views. Ugh.

Nontrivial bugs I noticed when preparing this patch:

- I forgot to placement-new unique pointers and attempted to
  assign them directly on uninitialized memory; very bad!  Sam
  Gross has encouraged me to replace this with a proper constructor
  but I keep putting it off, because once everything goes in
  StorageImpl there really will be a proper constructor.

- I rewrote a number of APIs to use newWithDataAndAllocator
  instead of newWithAllocator, calling the allocator at the
  call site (because they required "allocation context" which
  we no longer give to "allocators").  When I did this, I forgot
  to insert the multiplication with sizeof(real) to scale from
  numels to number of bytes.

- The implementation of swap on storages was missing it for
  scalarType and backend.  It was benign (because the only case
  we call swap is when these are the same), but I fixed it anyway.

- I accidentally returned a nullptr unique_ptr with no deleter,
  even though there was a legitimate one.  This matters, because
  some code still shoves its hands in the deleter context to
  get extra metadata about the function.

- I used std::move() on a unique_ptr, and then did a boolean
  test on the pointer aftewards (always false!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9358

Reviewed By: SsnL

Differential Revision: D8811822

Pulled By: ezyang

fbshipit-source-id: 4befe2d12c3e7fd62bad819ff52b054a9bf47c75
2018-07-15 15:11:18 -07:00
5eb9d40cc6 Introducing IsInf (#9169)
Summary:
torch.isinf - checks element wise +/- inf implements #9132
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9169

Reviewed By: SsnL

Differential Revision: D8768614

Pulled By: zou3519

fbshipit-source-id: dd1b5f6c976deb421d626e22cdd25500ec04d796
2018-07-15 07:55:09 -07:00
fda03406cf add device to CUDAEvent (#9415)
Summary:
This PR add a device_ member to CUDAEvent. This is necessary since if we create a cudaEvent on one device but destroy it from another, it also creates an additional context on that device. So this device information is needed to guard the cudaEventDestroy. (cc: ngimel is this expected behavior? I can provide a simple cu script to repro this).

c10d tests are probably not in CI yet, please let me know how the test are run and I could double check.

Thanks pietern apaszke for help debugging!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9415

Reviewed By: apaszke

Differential Revision: D8839688

Pulled By: ailzhang

fbshipit-source-id: b950ba37d57b9e3c5fe71726ec92f6a9601c4d0e
2018-07-14 13:38:41 -07:00
a4f63576b6 Make localScalar error message more intuitive (#9443)
Summary:
Fixes: #9419

This assumes that anyone who knows localScalar can also grep for the
error message or get a traceback.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9443

Reviewed By: soumith

Differential Revision: D8850718

Pulled By: ezyang

fbshipit-source-id: a106fee718fef97064e861810a49ca05f536f27e
2018-07-14 12:24:56 -07:00
8444e1660b Only accept continguous tensors in TopK for cuda (#9441)
Summary:
Fixes: #9421

I don't think it is easy to deal with non-contiguous array in cuda topk, so I'm adding a check.
The argument number is a bit confusing when it shows in PyTorch but it is consistent with the other checks. (Not sure whether it would make sense to eliminate argument numbers from the error TH/THC error messages given that they're probably off more than once...)

Do we need a test that it indeed refuses non-contiguous?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9441

Reviewed By: soumith

Differential Revision: D8850719

Pulled By: ezyang

fbshipit-source-id: d50561bb37ed50ab97aeaf54d8e3fc6c765bdc7c
2018-07-14 12:24:52 -07:00
88146484b4 Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299

Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.

I only implemented this on CPU so far.

Reviewed By: pjh5

Differential Revision: D8757381

fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
2018-07-14 10:54:13 -07:00
7160846c81 Only view() rhs of index_put if we need to (#9424)
Summary:
During tracing (and export) we are now introducing an unnecessary hard-coded view on the RHS of indexed assignments such as `tensor[idxs] = rhs`. This caused a regression in the PyTorch translate models because these expressions appear with variable sizes in the RHS. This change makes it so we only call view if we indeed need to strip leading 1-dimensions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9424

Reviewed By: colesbury

Differential Revision: D8838881

Pulled By: jamesr66a

fbshipit-source-id: 399e5daa7d021f4f59f6f92b9fae581f92bfc538
2018-07-14 00:10:21 -07:00
5ac8a80f8b Add BatchBucketizeOp in caffe2 (#9385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9385

The operator transform dense features to sparse features by bucketizing. Only the feature in indices tensor will be transformed and output.

Reviewed By: bddppq

Differential Revision: D8820351

fbshipit-source-id: a66cae546b870c6b2982ac20641f198334f2e853
2018-07-13 20:39:30 -07:00
099a6d5e08 Implementation of Wngrad optimizer caffe2 python wrapper and unit test on least square regression (#9001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9001

Closes https://github.com/pytorch/pytorch/pull/9001

We added caffe2 python wrapper and unit test for the Wngrad C++ operator.

Reviewed By: chocjy

Differential Revision: D8655724

fbshipit-source-id: fb259afd6fd50231691bd75c52852b20a1e1aec8
2018-07-13 18:54:52 -07:00
9e2f2cab94 Implementation and operator test for Wngrad optimizer (#8999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8999

Closes https://github.com/pytorch/pytorch/pull/8999

Implemented the WRgrad optimizer operator for dense (base case as well as the case with additional output for effective learning rate and update value) and sparse case.

Reviewed By: pjh5

Differential Revision: D8627933

fbshipit-source-id: a63cde46c04bcc6b428ab5f77a4b3b2beb66c046
2018-07-13 18:11:41 -07:00
86eeeab758 Fix segmentation fault in grad_fn (#9292)
Summary: Fixes #8774 .

Reviewed By: soumith

Differential Revision: D8836478

Pulled By: apaszke

fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38
2018-07-13 14:46:13 -07:00
bcd20f96e0 update docs (#9423)
Summary:
minor modification: fixed the incorrect comment format for ```split_size_or_sections``` (https://pytorch.org/docs/master/torch.html#torch.split)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9423

Differential Revision: D8841367

Pulled By: soumith

fbshipit-source-id: 2d09a38ce8d278ac29b3864e8d09a91cd296196c
2018-07-13 13:55:35 -07:00
fd25a2a86c Remove virtual+override anti-pattern (#9335)
Summary:
I'm cramming through clang tidy emitted warnings. This PR addresses the `hi-cpp-override` check which warns that `virtual` + `override` is redundant, since `override` already  signifies that a function is overriding and thus virtual.

Where there was `virtual` + `override` I removed the `virtual`, where there was `virtual` and no `override` I removed `virtual` and added `override`.

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9335

Differential Revision: D8807082

Pulled By: goldsborough

fbshipit-source-id: e0a261053f6540a22cc56ec160a24aa285af6319
2018-07-13 11:25:01 -07:00
c6376cf999 A reasonable way to detect Python include dirs and library
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9361

Reviewed By: ml7

Differential Revision: D8837706

Pulled By: pjh5

fbshipit-source-id: 6979f9f37709c23e72b9169531787a60f3b37254
2018-07-13 11:25:00 -07:00
cc9dcdff16 Improving THCReduce.cuh's performance on latency-bound non-contiguous reductions (#9214)
Summary:
This PR improves perfomance of (formerly) latency-bound non-contig-dim reduction kernels by up to 20X, while maintaining determinism.

Currently, reducing across a non-contiguous dimension uses the parallelism exposed across the number of output elements.  This means that performance suffers if the number of output elements is small.  Example:
```
a = torch.cuda.FloatTensor(32768, 32)
a.sum(dim=0)
```
Before this PR, `a.sum`'s kernel (kernelReduceNoncontigDim_shared) took 138 microseconds on my machine.  The speed-of-light estimate (based on a bandwidth of 700 GB/s) should be around 6 microseconds.  After this PR's changes, `a.sum(dim=0)`'s kernel takes 6.9 microseconds on my machine.

Christian implemented some nice logic to squeeze out better performance for cases like `a.sum` using intra-block and instruction-level parallelism across the dimension being reduced, but his kernel still only launched one block for every 32 output elements.  This was insufficient to saturate the device in many cases, like `a.sum` here (where only one block is launched).

My PR adds block cooperation across the dimension being reduced.  Many blocks, instead of one block, help to reduce into each 32 output elements.  Internally, each block leverages all of Christian's nice logic to compute a partial reduction into a per-block staging buffer, then the last block to finish combines the results to compute the final output.

Block cooperation does require THCudaMalloc-ing staging and semaphore buffers, so it's not always worthwhile.  I included a set of rough heuristics to decide when the kernel should choose to use block cooperation.  These heuristics are based on Python-side timings of calling sum() many times in a loop, and comparing to the old implementation.

 I tested a wide range of sizes (to determine heuristics) and as long as the number of output elements is greater than 16ish, I don't think there are any remaining pathological sizes where users will encounter unexpectedly poor performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9214

Reviewed By: gchanan

Differential Revision: D8808127

Pulled By: colesbury

fbshipit-source-id: 139f310fc6ea6d187a7c983128f8eb8e1c9b4be3
2018-07-13 11:10:51 -07:00
06e47d88b5 Remove ScalarConvert and cast_wrapper in favor of static_cast (#9401)
Summary:
While talking to mruberry, I noticed a few places that use
special cast wrappers that are no longer necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9401

Differential Revision: D8828874

Pulled By: colesbury

fbshipit-source-id: 2b7fe7ac3af3b71be26b43a9ad3949f8065a7bc9
2018-07-13 10:25:05 -07:00
57a05983be Move non-dimension reduction var/std to native wrappers. (#9400)
Summary:
This is to unify the handling of empty tensors in std/var between the dimension reduce and all reduce cases.
Also to avoid triggering ubsan errors around divide by 0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9400

Reviewed By: ezyang

Differential Revision: D8828879

Pulled By: gchanan

fbshipit-source-id: 6b9306805c94251eec28bd12e234618338bff4e3
2018-07-13 08:25:41 -07:00
f09828ee0e Support n-dimensional empty tensors in TensorShape methods. (#9362)
Summary:
This includes either bug fixes or NumPy semantics changes for the following methods:
chunk, diagonal, unfold, repeat, flatten, reshape, split, unsqueeze.

The n-dimensional empty tensor feature is still hidden behind a feature flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9362

Reviewed By: ezyang

Differential Revision: D8817002

Pulled By: gchanan

fbshipit-source-id: 6ff704ec96375f00b4dd39ebcd976efac0607fb4
2018-07-13 08:25:40 -07:00
3799b10c44 various documentation formatting (#9359)
Summary:
This is a grab-bag of documentation formatting fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9359

Differential Revision: D8831400

Pulled By: soumith

fbshipit-source-id: 8dac02303168b2ea365e23938ee528d8e8c9f9b7
2018-07-13 02:48:25 -07:00
bb9ff58c6d Add cudnn activation ops (#9379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9379

Add cudnn activation ops

Reviewed By: houseroad

Differential Revision: D8818013

fbshipit-source-id: d3881c634a46578b9331da07f9fdf7e1f31d7e8a
2018-07-12 23:18:56 -07:00
b15a7d05ce Inference benchmark: NUMA-awareness + multi-model support
Summary:
Pure experimental addition to guide us on delivering this
into real production systems and their threadpools. Biggest limitation
now is that we need to turn off BlackBoxPredictor activation
deallocation logic to get to sane performance

Reviewed By: highker

Differential Revision: D8798029

fbshipit-source-id: ec7962689d605fba62b2c9e0904309df567a25a4
2018-07-12 20:09:19 -07:00
cd3e067e46 Add reversed(torch.Tensor) (#9216)
Summary:
Closes https://github.com/pytorch/pytorch/issues/3376
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9216

Differential Revision: D8753933

Pulled By: soumith

fbshipit-source-id: 5dac9b8b11ff34a205b6478db99b02fda8bd9cce
2018-07-12 19:42:07 -07:00
04fce5eca6 Remove dummy c10 folder (#9367)
Summary:
This was previously meant to be used for c10 code but that plan since changed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9367

Reviewed By: orionr

Differential Revision: D8814361

Pulled By: smessmer

fbshipit-source-id: 8e35fa74e160343a2bb8432013847677aa73695a
2018-07-12 19:14:55 -07:00
117a5c3cc0 fix the annotation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9380

Differential Revision: D8821294

Pulled By: zou3519

fbshipit-source-id: b375cd0de9042bcaef1d22de104966fb704bd43e
2018-07-12 18:53:59 -07:00
4a796e4430 Initialization functions (#9295)
Summary:
To allow our C++  customers to use our initialization methods as well, this PR moves some of the code from `torch.nn.init` to ATen, calls it from Python, and adds equivalent code to the C++ frontend.

Notes:
1. Happy to hear thoughts on whether it's ok to have e.g. `torch.nn.init.dirac_` *and* `torch.dirac_` (the former has a `no_grad` guard). We have this for `ones_` and stuff too, so I don't mind it.
2. I left the exception checking in Python because they throw `ValueError`s while ATen errors show as `RuntimeError`s. I imagine this would break users' error handling if someone were to have a `try`-`except` handler for `ValueError` (or maybe it's a far fetch)

EDIT: After discussions with zdevito, the PR now simply duplicates the code in C++ exclusively for the C++ API, and we leave the Python code as-is (to make it easier for people to read/modify).

ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9295

Differential Revision: D8813793

Pulled By: goldsborough

fbshipit-source-id: 4b969f3f75952c1be4e837e19e23b8098e5fbd4b
2018-07-12 18:53:57 -07:00
e90860780b Migrate PriorCorrectionCalibration to Dper3
Summary:
Migrated PriorCorrectionCalibration from Dper2 layer to Dper3 module.

A few notes:
1. Calibration operators need dynamic linking;
2. All calibration implementation and tests are located in /modules/calibration/
3. Added a type inference function in operator_shcema.h/operator_schema.cc

Reviewed By: idning

Differential Revision: D8756832

fbshipit-source-id: 7e6300a3bb3d3feaaf3b82340ece2f35d71493fc
2018-07-12 18:40:07 -07:00
2ead3b0e54 Update include paths to use c10d prefix everywhere
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9397

Reviewed By: goldsborough

Differential Revision: D8825909

Pulled By: pietern

fbshipit-source-id: 25af272819e04eacbb6bd69e3f1c03c78f091d13
2018-07-12 17:55:22 -07:00
34554d6adb Enable standalone build of ATen (#9377)
Summary:
This PR changes the ATen `CMakeLists.txt` slightly, to enable standalone build of ATen inside PyTorch. Currently, the tests in ATen gets linked to `libcaffe.so libcaffe2.so`. As a result, ATen can't be built standalone without building from the root pytorch directory. I know that there is a big merge happening between caffe2 and pytorch and hence, the purpose of this PR is to really start a conversation on what would be the proper way of migrating the CMakeLists to enable clean builds. We should also follow up on this PR: https://github.com/pytorch/pytorch/pull/7275. For your reference, that PR has the explanation for why `-Wl --no-as-need` is needed. Moreover, without `set(ATen_CUDA_SRCS ${all_cuda_cpp})`, the standalone build will throw unresolved references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9377

Reviewed By: smessmer

Differential Revision: D8825921

Pulled By: orionr

fbshipit-source-id: c521159b4885639fc7990a9819202051455d07db
2018-07-12 14:25:00 -07:00
43103af7a7 Use at::DeviceGuard everywhere (#9396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9396

The custom and local CUDADevice RAII wrapper has been superseded by at::DeviceGuard so it doesn't make sense to keep it around.

Reviewed By: ailzhang

Differential Revision: D8824200

fbshipit-source-id: 39fa00ffab4f495606c8001446e976bbf603e866
2018-07-12 13:43:47 -07:00
99dbcd0451 set CMAKE_HIP_ARCHIVE_APPEND (#9394)
Summary:
petrex

To make `-DBUILD_SHARED_LIBS=OFF` working
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9394

Reviewed By: mingzhe09088

Differential Revision: D8822947

Pulled By: bddppq

fbshipit-source-id: 4fb213c723138804fb0fdb3b381e32623cf14468
2018-07-12 12:24:49 -07:00
feaee21968 Plotting embeddings norm being slow in distributed training. (#9325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9325

as title. Fixing by calculating norm on same device.

Reviewed By: chocjy

Differential Revision: D8668136

fbshipit-source-id: 6671a1858da4b0a6f766f067b7fa648a072cd219
2018-07-12 11:51:23 -07:00
374fee4804 Minor cleanup to scripts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9354

Reviewed By: orionr

Differential Revision: D8810415

Pulled By: pjh5

fbshipit-source-id: 792b0dc6f6a4fabde38e2ad4475963526204914c
2018-07-12 10:54:44 -07:00
d017e1798f add erfc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9366

Differential Revision: D8816768

Pulled By: soumith

fbshipit-source-id: 7d709f932cf156a2e7ec71c710837beb7f647d66
2018-07-12 08:32:02 -07:00
b154761547 Guard nullptrs around memcpy.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9370

Reviewed By: ezyang

Differential Revision: D8816996

Pulled By: gchanan

fbshipit-source-id: 8cad41a5259774d86e94807eb4a7f43f66fdf47f
2018-07-12 08:32:00 -07:00
483ae8cb5d Replaces const ref with && for apply (#9175)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/5011
Tested with python test/test_autograd.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9175

Reviewed By: zdevito

Differential Revision: D8736377

Pulled By: marymcbreen

fbshipit-source-id: ff86f427f7b2cf0cab5912e7f32812bd0f49a712
2018-07-12 08:31:59 -07:00
e1863778e3 Guard gloo algorithm creation with DeviceGuard (#9371)
Summary:
Let us avoid creating a context on GPU0 unnecessarily.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9371

Reviewed By: pietern

Differential Revision: D8817343

Pulled By: apaszke

fbshipit-source-id: a6cc91a1dd127840486a42c64f97f117475b0d5f
2018-07-11 23:08:31 -07:00
aeccec755d In Gloo backend use ring reduction by default (#9309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9309

This is faster when you're dealing with a small number of processes.
Around the 16 processes mark the halving/doubling algorithm is faster.

Reviewed By: apaszke

Differential Revision: D8785364

fbshipit-source-id: 4a03326266e473026d943787186e149d0cc489f0
2018-07-11 21:40:01 -07:00
00b4b4703e fix unsqueeze doc (#9374)
Summary:
fixes #9348
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9374

Differential Revision: D8817215

Pulled By: SsnL

fbshipit-source-id: 047661ae4556bb19e4cd125b01a3fd75ed6642f3
2018-07-11 21:25:44 -07:00
7f38ea4555 Remove unused feature: num PS tuning
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9293

Reviewed By: huitseeker

Differential Revision: D8778499

fbshipit-source-id: 0cf59e02cb37b3fe22885c1b5e10b5d2e7585382
2018-07-11 18:54:45 -07:00
a487b08c2e AutoBatching - IR transformation(basic operators) (#9198)
Summary:
Use decorator `torch.jit.batch` to implement auto-batching (call `to_batch` pass to do IR tranformation).
- `to_batch` pass: "to_batch.h/cpp" in csrc/jit/passess to transform a graph to a new batched graph.
- Write several basic operators for BatchTensor (add, mul, sigmoid, tanh, mm, matmul, select).
- Register the operators in a lookup table `<std::string, std::shared_ptr<Graph>>`. (use the Graph to replace the original node in IR graph)

Move BatchTensor in python from torch.BatchTensor to torch.jit.BatchTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9198

Reviewed By: zdevito

Differential Revision: D8744466

Pulled By: ChunliF

fbshipit-source-id: 9ea56a30f55cb870f13a2069a47cc635419763ff
2018-07-11 18:25:07 -07:00
e30ff68410 Add Hardtanh Export (#8804)
Summary:
Added hartanh CPU/GPU Implementations and backend tests to Caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8804

Reviewed By: bddppq

Differential Revision: D8813987

Pulled By: houseroad

fbshipit-source-id: 2480296eab3373425b9e1734a10c009b4f5d3e26
2018-07-11 18:09:51 -07:00
1a8e826ed4 Skip the count_include_pad in average pool for now (#9365)
Summary:
Will create a bootcamp task.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9365

Reviewed By: bddppq

Differential Revision: D8813889

Pulled By: houseroad

fbshipit-source-id: bce1eaafd0efb3c27c0f71fcc40a8313e2b1c7b8
2018-07-11 18:09:50 -07:00
153e2e96d4 Make Sequential ref-counted (#9151)
Summary:
In the C++ API, `Sequential` currently was not refcounted itself, but stored `shared_ptr<AnyModule>` to get the reference semantics. This is unfortunate because most modules in the API are accessed via `->`, e.g. `Linear l(1, 2); l->forward(...);`. `Sequential` was different in that it had value semantics itself, thus was accessed via `.`.

This PR makes `Sequential` store `AnyModule` (without extra indirection), and uses the same pImpl mechanism we use for all other modules to make `Sequential` have reference semantics itself. This makes it consistent with the rest of the library. It also removes one level of indirection inside of `Sequential`, which is cool.

One thing I had to change was that the `ModuleHolder` with which the whole pImpl thing is implemented previously did some tricks to make `Linear(3, 4)` actually construct `Linear(LinearOptions(3, 4))`. This doesn't work well with `Sequential` since it takes a variadic parameter pack. Instead, I made `ModuleHolder` forward all arguments to the underlying module, and then further pushed the trick to forward parameters to modules' options types into the actual Modules. This adds one constructor per Module in the library. This is not something user modules have to do (unless they want this nice forwarding themselves). It makes the code simpler overall.

ezyang ebetica apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9151

Reviewed By: ezyang

Differential Revision: D8809298

Pulled By: goldsborough

fbshipit-source-id: da68452c3de912fbc67af330ba93b5220de6909f
2018-07-11 17:24:59 -07:00
94bc4c6091 Ensure pending tasks are finished in case of failure (#9290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9290

Ensure pending tasks (e.g. network ops) are finished when net fails

Reviewed By: heslami

Differential Revision: D8777230

fbshipit-source-id: e57fcf1df6aa0ed8847923391502b666edb43674
2018-07-11 15:39:46 -07:00
8253947256 Make error message more informative (#9352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9352

I am debugging a failed workflow f61490672, and found the original error message to be not informative.

Differential Revision: D8808181

fbshipit-source-id: 3f524ca092881186a492c5c0456124ce31d54751
2018-07-11 15:09:46 -07:00
7f33ec55b2 Fix Eigen issue on OS X with CUDA and nvcc compile (#9350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9350

Re-apply #9270

Breaking this out of #8338

This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer.

Reviewed By: mingzhe09088

Differential Revision: D8794431

fbshipit-source-id: de656334af46c697802073f8e8d9a6aeb9ca65a7
2018-07-11 14:00:05 -07:00
cbcf45274b Move tanh function to math (#9328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9328

Move tanh function to math

Reviewed By: houseroad

Differential Revision: D8794745

fbshipit-source-id: ea525dedde6f53592b06c2caffd6426688dea5fc
2018-07-11 13:59:50 -07:00
7d8b532c1f Fix CUDA build failures (#9347)
Summary:
Breaking this out of #8338

This fixes some CUDA related build and runtime issues after BUILD_CAFFE2 and BUILD_ATEN are removed.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9347

Reviewed By: orionr

Differential Revision: D8806954

Pulled By: mingzhe09088

fbshipit-source-id: 9f8e3feee06478d1ac2deb30796939453352d388
2018-07-11 13:39:59 -07:00
80380f637c Fix to make ONNXIFI flow work (#9340)
Summary:
Small step to have Relu test work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9340

Reviewed By: bddppq

Differential Revision: D8807018

Pulled By: yinghai

fbshipit-source-id: 429f3185e12afb12aaecfea8dd9595fdf838d356
2018-07-11 13:09:41 -07:00
18a975210d Add explicit to conversions (#9336)
Summary:
Another code-mod for clang-tidy: Conversion operators should be marked explicit so that they don't cause unwanted implicit conversions. This is especially important for `operator bool()`, see https://stackoverflow.com/questions/39995573/when-can-i-use-explicit-operator-bool-without-a-cast

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9336

Reviewed By: apaszke

Differential Revision: D8807065

Pulled By: goldsborough

fbshipit-source-id: 0e9f4ebd0048a2a510c0d05fa410695d7e977eb1
2018-07-11 12:10:30 -07:00
c2dd90c40e Add angle normalization for rotated boxes (#9056)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9056

Closes https://github.com/pytorch/pytorch/pull/9056

Updates bbox_transform for rotated boxes with angle info to normalize the
predicted angle to be within [angle_bound_lo, angle_bound_hi] range.

Reviewed By: pjh5

Differential Revision: D8706240

fbshipit-source-id: f3ee834cf362736136e285f0f8f0c063af94a879
2018-07-11 11:25:54 -07:00
9126f95ac3 GenerateProposals and BoxWithNMSLimit ops: Add support for rotated boxes (#8953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8953

Closes https://github.com/pytorch/pytorch/pull/8953

Based on RRPN paper: https://arxiv.org/abs/1703.01086

Reviewed By: pjh5

Differential Revision: D8655687

fbshipit-source-id: 4985739e585c07dd406b9386dc7f46ad93576798
2018-07-11 11:25:52 -07:00
491f317b24 NMS util for rotated boxes (#8954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8954

Closes https://github.com/pytorch/pytorch/pull/8954

Based on RRPN paper: https://arxiv.org/abs/1703.01086

Reviewed By: pjh5

Differential Revision: D8618673

fbshipit-source-id: 4c54297e3b3bf614de4d7c0146176a419518790a
2018-07-11 11:25:49 -07:00
8da936ab52 Fix the build break for python3.7 PyUnicode_AsUTF8AndSize() prototype changing (#9259)
Summary:
https://docs.python.org/3.7/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize
The return type changes from "char*" to "const char*".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9259

Reviewed By: orionr

Differential Revision: D8776219

Pulled By: pjh5

fbshipit-source-id: e5eadf71264002ba57cfb68dd39686a7ec074092
2018-07-11 10:39:43 -07:00
b9f575fc33 Remove legacy code from the JIT (#9323)
Summary:
In particular, get rid of backward tracing and CppOp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9323

Reviewed By: ezyang

Differential Revision: D8795935

Pulled By: apaszke

fbshipit-source-id: fb7a7eeee41902da35f2a8efd77262ca60fd6bbe
2018-07-11 10:25:38 -07:00
05559b4071 Accumulate MSELoss reduce=True into accreal instead of real (#9287)
Summary:
THNN was accumulating the result of reduction loss functions
into real instead of accreal. This was causing precision issues with
MSELoss.

This patch only fixes MSELoss. Some of the other losses exhibit bad precision as well (because they accumulate into real instead of accreal) and require more investigation. I will open an issue for those (#9286)

Fixes #8710

cc li-roy SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9287

Reviewed By: SsnL

Differential Revision: D8775708

Pulled By: zou3519

fbshipit-source-id: d1a1f159deee0cb90fd8e81e63b246115eea8e9e
2018-07-11 10:25:36 -07:00
748a90d05b BBoxTransform op: Add support for rotated boxes (#8952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8952

Closes https://github.com/pytorch/pytorch/pull/8952

Based on RRPN paper: https://arxiv.org/abs/1703.01086

Reviewed By: pjh5

Differential Revision: D8598547

fbshipit-source-id: 3699379df9bf45ed5bdd395175a0e26a77e079f7
2018-07-11 10:25:34 -07:00
01cffaa7e8 fix extra output in generate_code.py (#9339)
Summary:
operator.cpp is not generated. removing the line prevents generate_code.py from always thinking it is out of date and running.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9339

Reviewed By: ezyang

Differential Revision: D8798689

Pulled By: zdevito

fbshipit-source-id: f25a2e215fec29aa51571e6a31771f0f91e7a213
2018-07-11 10:25:31 -07:00
b2a74d17ad document torch.utils.dlpack (#9343)
Summary:
dlpacks deserve documentation. :)

I wonder whether it might make sense to merge the various small torch.utils pages (and include a link for the larger ones, e.g. data) to enhance the structure in the docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9343

Differential Revision: D8801227

Pulled By: soumith

fbshipit-source-id: 2980d271971743b86f052bec5a2cb4d146a90d9b
2018-07-11 07:46:09 -07:00
04a7fc1dc4 Add Upsample support in C2 onnx backend for opset 1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9327

Reviewed By: ailzhang

Differential Revision: D8798462

Pulled By: houseroad

fbshipit-source-id: d7d1127a853de6a7bb8fdef146f283487e1e5569
2018-07-10 22:43:25 -07:00
fb9f9c9ba2 Implement Sinh and Cosh (#9213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213

Closes https://github.com/pytorch/pytorch/pull/9213

Added hyperbolic trig functions Sinh and Cosh

Reviewed By: BIT-silence

Differential Revision: D8752566

fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7
2018-07-10 18:55:31 -07:00
00aeb0b84b Privatize values for vec256 (#9321)
Summary:
Helps prevent calling functions of the base case on float/double/int subclasses that aren't supported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9321

Reviewed By: colesbury

Differential Revision: D8793627

Pulled By: cpuhrsch

fbshipit-source-id: 7fde779ecd4b890dda406f3d1306b58bab40efe2
2018-07-10 18:11:16 -07:00
b4c66459c5 Add pyHIPIFY scripts needed for ROCm transpilation to PyTorch (#8812)
Summary:
As discussed in call, this will allow us to keep this integral part of the effort to run PyTorch on ROCm in sync with the main code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8812

Reviewed By: ezyang

Differential Revision: D8796245

Pulled By: bddppq

fbshipit-source-id: 8e12c2acf6a7e0740f31b21e50be74e10ed8b12c
2018-07-10 18:02:43 -07:00
a47a30b9ce Implement grid_sampler in aten (#8929)
Summary:
Partially addresses #8928.

Maybe #7273?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8929

Reviewed By: ezyang

Differential Revision: D8668919

Pulled By: li-roy

fbshipit-source-id: 8ad07b224d2ab211c274c4c10f042501efaae32c
2018-07-10 15:10:24 -07:00
ea1869244f Change depthwise convolution bandwidth formula (#9317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9317

Change depthwise convolution bandwidth formula

Reviewed By: hlu1

Differential Revision: D8786684

fbshipit-source-id: ba76fea94a6d2fda8d87f40dd626b3dfd90770ed
2018-07-10 14:24:10 -07:00
0a679105ff Fix missing accept file changes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9313

Reviewed By: ezyang

Differential Revision: D8789043

Pulled By: zdevito

fbshipit-source-id: 283607116c49a4f3a82658d9b4d45f5df3ae283b
2018-07-10 13:39:24 -07:00
e9e47ce8f1 Vectorize sigmoid (#8612)
Summary:
This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8612

Reviewed By: ezyang

Differential Revision: D8712298

Pulled By: cpuhrsch

fbshipit-source-id: 01a3d06af8d04513edd024ab1d01a6b753fc6f6a
2018-07-10 12:40:39 -07:00
efefd1d7cf Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885)
Summary:
This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness.

Commit 1
=======

This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process.

* Switches schema over to parsed declarations, in the future this will allow something like:

```
  registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) {
    ...
  })
```

This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines).

The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that.

Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types.

This removes the other way we encoded schema, and makes it easier to see what schema are registered.

Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6

* Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself.

* Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change.

* Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier.

Commit 2
=======

This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed.

* Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info.

* Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes.

* Remove addInterpreterOpHandler in favor of global operator registry.

* Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered.

* Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op.

* Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors.  In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first.

* remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication

* refactor stack manipulation functions into a separate header file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885

Reviewed By: jamesr66a

Differential Revision: D8751048

Pulled By: zdevito

fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557
2018-07-10 10:24:48 -07:00
d867757649 Fix CUDA 8 build for Windows (#9300)
Summary:
Replacement of #9023.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9300

Differential Revision: D8781492

Pulled By: soumith

fbshipit-source-id: 6c0994da46d3112c24769f92366836c397891d93
2018-07-10 10:24:46 -07:00
8e6e8098ce Revert D8768025: [pytorch][PR] Fix Eigen issue on OS X with CUDA and nvcc compile
Differential Revision:
D8768025

Original commit changeset: 5b34017aeb67

fbshipit-source-id: 6ec892ff483bb9d966eb7138eadc77443972c8f8
2018-07-10 10:24:43 -07:00
bbeae24145 Fix Eigen issue on OS X with CUDA and nvcc compile (#9270)
Summary:
Breaking this out of #8338

This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer.

cc mingzhe09088 smessmer BIT-silence Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9270

Reviewed By: mingzhe09088

Differential Revision: D8768025

Pulled By: orionr

fbshipit-source-id: 5b34017aeb67e35a1b5938d962181ccd4cd37591
2018-07-10 09:25:42 -07:00
3254bcaed8 Call deleter when destroying unconsumed DLPack PyCapsules (#9297)
Summary:
Usually DLPack consumer is expected to call DLManagedTensor's
deleter to signal that it doesn't need the contents.
This patch calls the deleter when freeing unconsumed
DLPack capsules created by PyTorch.

Test script:
```
import torch
import torch.utils.dlpack
import gc
for i in range(10000):
    a = torch.randn(1000,1000, dtype=torch.float32, device='cuda')
    b = torch.utils.dlpack.to_dlpack(a)
    gc.collect()
```
Before patch: consume all GPU ram.
After patch: constant GPU ram consumption.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9297

Differential Revision: D8781571

Pulled By: soumith

fbshipit-source-id: 2ebadec6c857646220d632ca64110af430dbd52f
2018-07-10 07:56:59 -07:00
89c2b50a15 Grad clip for parameters on different devices (#9302)
Summary:
I'm trying to write a multi-gpu network by pipelining some layers onto different GPUs. However, the current gradient clip requires all the parameters to locate in the same device.

The overhead of CUDA launch is reduced since the scalar calculation is performed on CPU, but it introduces extra data transfers.

No performance regression is observed by running the following snippet:
```python
import time

import torch

module = torch.nn.Sequential(
    torch.nn.LSTM(1024, 1024),
    torch.nn.LSTM(256, 256),
    torch.nn.Linear(100, 10000),
).cuda()

torch.nn.utils.clip_grad_norm_(module.parameters(), 1)
torch.cuda.synchronize()
start = time.time()
for _ in range(1000):
    torch.nn.utils.clip_grad_norm_(module.parameters(), 1)
torch.cuda.synchronize()
time_elapse = time.time() - start
print('{} ms per clip'.format(time_elapse))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9302

Differential Revision: D8781551

Pulled By: soumith

fbshipit-source-id: 9d76d01fe0531927f770a16b9523872a7e08e927
2018-07-10 07:56:55 -07:00
1597fc594d 3d conv should use int64_t (#9274)
Summary:
Fixes #9264 .

There can be so many elements in the output of `vol2col` so it overflows `int` range! This PR changes 3d conv to use `int64_t` mostly.

Also fixes some unused var warning (cc goldsborough )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9274

Differential Revision: D8770682

Pulled By: SsnL

fbshipit-source-id: f6e37f1aa56fe1009dd4c9bcbc042244e47252db
2018-07-10 06:39:45 -07:00
d0d1820814 Add weak pointer and finalizer support directly to THStorage. (#9148)
Summary:
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions.  Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died.  This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage.  Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr.  When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage.  For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity).  This means if you
keep calling it, we'll keep piling on finalizers.  That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148

Differential Revision: D8729106

Pulled By: ezyang

fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f
2018-07-10 06:25:33 -07:00
e06abab264 Fix Upsample ONNX Symbolic (#9288)
Summary:
Adjust the change to changes in ATen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9288

Reviewed By: ailzhang

Differential Revision: D8779078

Pulled By: houseroad

fbshipit-source-id: 7f387eeb35ae1f5a1494afc6287853a87a6173b4
2018-07-09 23:25:26 -07:00
181d2a5e60 Add support of is_compatible for old version of onnx (#9284)
Summary:
Fix the problem if caffe2 works with old version of onnx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9284

Reviewed By: yinghai

Differential Revision: D8773894

Pulled By: houseroad

fbshipit-source-id: 99b5a962099f854edc85a2ea815cb88c82a6e175
2018-07-09 21:09:14 -07:00
7ace3a99ec Fix TensorRT tests (#9285)
Summary:
ONNX-TensorRT is still using old opset (<7). Patch it for now.

Future fix would be expose versioning in onnx exporter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9285

Reviewed By: houseroad

Differential Revision: D8775268

Pulled By: yinghai

fbshipit-source-id: c272073f80cce35ebd971e44ec9472e3c8fd4b9e
2018-07-09 20:40:19 -07:00
4498fb962b Add space around operator (#9294)
Summary:
Fixes lint failure on master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9294

Differential Revision: D8779010

Pulled By: goldsborough

fbshipit-source-id: da1ea2604189fd704c22fa8a5770bd92845cea91
2018-07-09 20:24:21 -07:00
f92edf7ef4 N-dimensional empty tensors: indexing, factories, reductions. (#9209)
Summary:
This PR implements and tests N-dimensional empty tensors for indexing, factories, and reductions if compiled with -DUSE_TH_SIZE_ZERO_DIM.

Still remaining to add:
1) TensorShape functions
2) Simple linear algebra functions (matrix multiply variants)
3) Other functions that operate over a dimension (but don't reduce).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9209

Reviewed By: ezyang

Differential Revision: D8751257

Pulled By: gchanan

fbshipit-source-id: 2113374dc7af6caf31a99bf67b3893f130a29e23
2018-07-09 19:40:01 -07:00
19ecb5f8ad Fix docs for Windows CUDA 8 builds (#9254)
Summary:
Fixes #9200.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9254

Differential Revision: D8778011

Pulled By: soumith

fbshipit-source-id: 0a2c2863ac1bc515397fc446039db64d1d4e236d
2018-07-09 18:55:03 -07:00
99ab082366 Making setup.py install work for Caffe2 (#8509)
Summary:
Tested on my mac on a pretty clean anaconda3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8509

Reviewed By: orionr

Differential Revision: D8702257

Pulled By: pjh5

fbshipit-source-id: eda03ef9732da9fc56b31d909af5c0e39520d689
2018-07-09 18:10:58 -07:00
342dbcc35a Remove legacy redundant codes (#9252)
Summary:
Fix #9167
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9252

Differential Revision: D8774644

Pulled By: soumith

fbshipit-source-id: 0b004f497026bca3b101c577e78aec22bdc3df51
2018-07-09 16:55:28 -07:00
2b8aea3ada add more logging messages to dimension checks of FCGradient (#9203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9203

Closes https://github.com/pytorch/pytorch/pull/9203

Added extra logging for FCGradient input dimension checks

Reviewed By: yinghai

Differential Revision: D8738549

fbshipit-source-id: d4f26572d86f3d44f40c9dca62d4f241ba15aead
2018-07-09 16:55:26 -07:00
c67ade26a7 Add onnx support for clamp_min clamp_max (#9224)
Summary:
Add support for clamp as required by https://github.com/onnx/onnx/issues/1168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9224

Reviewed By: yinghai

Differential Revision: D8758945

Pulled By: houseroad

fbshipit-source-id: fad724d273c59f4527e96481ee6b2d14bfba205d
2018-07-09 16:25:44 -07:00
01a7ca3d64 Fix Pytorch Mac build issues (#9283)
Summary:
Breaking this out of #8338

This fixed Mac build issues after BUILD_CAFFE2 and BUILD_ATEN are removed.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9283

Reviewed By: orionr

Differential Revision: D8773459

Pulled By: mingzhe09088

fbshipit-source-id: 71942e8e6891a625e6b1a7dc0160e87444c64209
2018-07-09 15:40:46 -07:00
29b1c2cfce Install typing for Mac (#9271)
Summary:
Breaking this out of #8338

When BUILD_CAFFE2 and BUILD_ATEN are removed, we need to install typing on Mac.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9271

Reviewed By: orionr

Differential Revision: D8768701

Pulled By: mingzhe09088

fbshipit-source-id: 052b96e90e64b01e6b5dd48b91c0fb12fb96b54a
2018-07-09 14:58:50 -07:00
a70a90b28f Fix pytorch linux build issues (#9273)
Summary:
Breaking out of #8338

This fixes the build issues with pytorch on linux machines after BUILD_CAFFE2 and BUILD_ATEN are removed.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9273

Reviewed By: orionr

Differential Revision: D8768869

Pulled By: mingzhe09088

fbshipit-source-id: 2730426ed1bed398eb5dc804c7348aeeb27c93d3
2018-07-09 14:41:36 -07:00
d0ad696f9d Warn about THPObjectPtr needing GIL. (#9265)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9265

Differential Revision: D8767687

Pulled By: ezyang

fbshipit-source-id: 900b37f2749112cafc5b48e7b444a256df18186a
2018-07-09 13:55:22 -07:00
b19b38c427 Fix Mac CUDA issues (#9269)
Summary:
Breaking this out of #8338

This takes care of failures we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Specifically, smessmer fixed `std::hash` being handled in a weird way by nvcc and I fixed an nvcc template issue by moving `SparseNormalizeOp::RunOnDevice` implementation into the cc file.

cc mingzhe09088 smessmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9269

Reviewed By: mingzhe09088

Differential Revision: D8767984

Pulled By: orionr

fbshipit-source-id: 550686bfcef6d331f16d593859c99169216c5c2e
2018-07-09 12:40:40 -07:00
744cd90074 Fix Android build issue (#9275)
Summary:
Breaking this out of #8338

This fixed an Android build issue after BUILD_CAFFE2 and BUILD_ATEN are removed.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9275

Reviewed By: orionr

Differential Revision: D8769913

Pulled By: mingzhe09088

fbshipit-source-id: afce52a12697757a0b2103c7c343e19ab158a9f7
2018-07-09 12:40:37 -07:00
cb98c5020a Normalize IDEEP spatial bn op test (#9276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9276

Use `checkDevice` instead rolling our own.

Reviewed By: orionr

Differential Revision: D8769401

fbshipit-source-id: bd47ec2b2501552c2da1cee2eb9ad96a215602b4
2018-07-09 11:55:41 -07:00
936f47f271 Make roi_align_rotated_op_test not rely on 1.12.0 numpy.rot90 (#9267)
Summary:
Breaking this out of https://github.com/pytorch/pytorch/pull/8338

Use a local version of `np.rot90` with an `axes` argument, since we don't have NumPy 1.12.0 in all of the test environments. Caffe2 conda2-ubuntu16.04, for example, fails. Generally, it seems better to not require a NumPy bump just for this test.

cc mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9267

Reviewed By: mingzhe09088

Differential Revision: D8767819

Pulled By: orionr

fbshipit-source-id: c51a6295d58366eba06e4e55e3f1ffaa8af96975
2018-07-09 11:55:39 -07:00
768a0e3298 Some more changes to support USE_CUDNN=OFF (#9268)
Summary:
Breaking this out of #8338

More changes required to support USE_CUDNN=OFF. We should be able to land some of our fixes before the big BUILD_CAFFE2 and BUILD_ATEN removal lands.

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9268

Reviewed By: mingzhe09088

Differential Revision: D8767981

Pulled By: orionr

fbshipit-source-id: 0607ca2773253b685209c274a3adf70180d8ce58
2018-07-09 11:55:38 -07:00
1483bb7246 Remove unused functions (#9223)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9223

TSIA

Reviewed By: ezyang

Differential Revision: D8755761

fbshipit-source-id: 284fa03397df5626bd56de150b90ba61ae3b8c6e
2018-07-09 10:09:47 -07:00
e8536c08a1 Update extension docs, fix Fold/Unfold docs (#9239)
Summary:
Commits:
1. In extension doc, get rid of all references of `Variable` s (Closes #6947 )
    + also add minor improvements
    + also added a section with links to cpp extension :) goldsborough
    + removed mentions of `autograd.Function.requires_grad` as it's not used anywhere and hardcoded to `return_Py_True`.
2. Fix several sphinx warnings
3. Change `*` in equations in `module/conv.py` to `\times`
4. Fix docs for `Fold` and `Unfold`.
    + Added better shape check for `Fold` (it previously may give bogus result when there are not enough blocks). Added test for the checks.
5. Fix doc saying `trtrs` not available for CUDA (#9247 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9239

Reviewed By: soumith

Differential Revision: D8762492

Pulled By: SsnL

fbshipit-source-id: 13cd91128981a94493d5efdf250c40465f84346a
2018-07-08 19:09:39 -07:00
f48e15624e Unique cuda support (#8899)
Summary:
Add cuda support for unique.

There is a simple test below for a tensor including 1M <int> data.
And the performance is faster.

```python
Performance
cpu: 0.05040597915649414 s
x: tensor([1, 3, 1,  ..., 4, 9, 4])
x output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
x inverse: tensor([0, 2, 0,  ..., 3, 8, 3])

gpu: 0.015192985534667969 s
y: tensor([1, 3, 1,  ..., 4, 9, 4], device='cuda:0')
y output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], device='cuda:0')
y inverse: tensor([0, 2, 0,  ..., 3, 8, 3], device='cuda:0')
```

```python
Code
import torch
import time
x=torch.randint(1,10,(1000000,),dtype=torch.long)
device = torch.device("cuda")
y=x.to(device)
start = time.time();
output,inverse = x.unique(sorted=True,return_inverse=True)
stop = time.time();
print('cpu:',stop-start,'s')
print('x:',x)
print('x output:',output)
print('x inverse:',inverse)

start = time.time();
output1,inverse1 = y.unique(sorted=True,return_inverse=True)
torch.cuda.synchronize();
stop = time.time();
print('gpu:',stop-start,'s')
print('y:',y)
print('y output:',output1)
print('y inverse:',inverse1)
```
Closes https://github.com/pytorch/pytorch/pull/8899

Reviewed By: SsnL

Differential Revision: D8677655

Pulled By: ezyang

fbshipit-source-id: 09df3f0602f235c5d36c7a6e7e1d89dbf82570bb
2018-07-08 17:09:26 -07:00
819815d9c0 Fix missing compile_commands.json for aten (#9227)
Summary:
When we moved the libaten build into libcaffe2, we changed the location where it generated compile_commands.json such that it was no longer being picked up by the build script. This fixes it so it is still found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9227

Reviewed By: goldsborough

Differential Revision: D8757984

Pulled By: zdevito

fbshipit-source-id: 73df26bf08d98f18ac841d6c0db7e332fd328ab6
2018-07-08 16:54:34 -07:00
a615baa51f move unbind to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8587

Differential Revision: D8764086

Pulled By: soumith

fbshipit-source-id: 7f311cf13c341040e1f2cf4a8f05723e32d38947
2018-07-08 16:46:35 -07:00
66dc97e51c #8714 Improve Error Messages for module re-assignment (#9212)
Summary:
Here's an improved error message.  Let me know if this change makes the errors a little clearer.
Closes https://github.com/pytorch/pytorch/pull/9212

Reviewed By: soumith

Differential Revision: D8752896

Pulled By: jramseyer

fbshipit-source-id: d2bd8462c3ddf14acd3de56a4c1aeb75a9bc4067
2018-07-08 16:46:33 -07:00
d6f21fc663 Ports Streams to ATen (#8997)
Summary:
This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it:

+ Creates a new (THC free) at::CUDAStream class and API
+ Extends the at::Context API to expose it
+ Stubs the current THCStream and THCState APIs to use it
+ Updates THC to no longer violate stream encapsulation (stream.hpp is dead)
+ Adds an ATen cpp test of the API
+ Bonus: Removes some debug spew in test_nn.py

The new API has several advantages over the old one:

(1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t.
(2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique)
(3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState
(4) The internals are completely encapsulated, unlike the historic THCStream
(5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention)

There are a couple things this PR does not do, however, which are left for future work:

- It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream).
- It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed.

I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8997

Differential Revision: D8683375

Pulled By: soumith

fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf
2018-07-08 16:25:09 -07:00
75919b4e18 Expose generic device copy algorithm (#9009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9009

Closes https://github.com/pytorch/pytorch/pull/9009

Nice little helper for the related stacked diff

github_tests_pass

Reviewed By: hyuen

Differential Revision: D8688509

fbshipit-source-id: 22de241d69932210d161df1e29d9c41eb50a8133
2018-07-08 15:40:36 -07:00
4ad6e53557 fix the deprecate argument in bce with logits (#9162)
Summary:
As title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9162

Differential Revision: D8753892

Pulled By: SsnL

fbshipit-source-id: 7ce9ac16571a550a3fa7b86d68eb5c077a5956fb
2018-07-07 10:26:35 -07:00
f40ed548d8 Bump onnx submodule (#9215)
Summary:
To include new onnx backend test cases
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9215

Reviewed By: houseroad

Differential Revision: D8754785

Pulled By: bddppq

fbshipit-source-id: 2c113a7155c537c4ec5ddb021661d68acb775879
2018-07-06 15:42:22 -07:00
067b270717 Optimize LeakyReLU and PReLU 'forward' functions on the CPU (#9206)
Summary:
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.

```
import os
os.environ['OMP_NUM_THREADS']='1'  #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
    net.eval()
    total=0
    with torch.no_grad():
        for _ in range(100):
            x = torch.randn(100,100,100)+offset
            start_time = time.time()
            y = net(x)
            total+=time.time()-start_time
    print(net, total*10, 'ms')

for offset in [-1,0,+1]:
    test_net(nn.LeakyReLU(),offset)
    test_net(nn.PReLU(),offset)
```
Closes https://github.com/pytorch/pytorch/pull/9206

Reviewed By: yf225

Differential Revision: D8749491

Pulled By: btgraham

fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
2018-07-06 15:42:19 -07:00
227c8f2654 Implement nn.functional.interpolate based on upsample. (#8591)
Summary:
This PR addresses #5823.

* fix docstring: upsample doesn't support LongTensor

* Enable float scale up & down sampling for linear/bilinear/trilinear modes. (following SsnL 's commit)

* Enable float scale up & down sampling for nearest mode. Note that our implementation is slightly different from TF that there's actually no "align_corners" concept in this mode.

* Add a new interpolate function API to replace upsample. Add deprecate warning for upsample.

* Add an area mode which is essentially Adaptive_average_pooling into resize_image.

* Add test cases for interpolate in test_nn.py

* Add a few comments to help understand *linear interpolation code.

* There is only "*cubic" mode missing in resize_images API which is pretty useful in practice. And it's labeled as hackamonth here #1552. I discussed with SsnL that we probably want to implement all new ops in ATen instead of THNN/THCUNN. Depending on the priority, I could either put it in my queue or leave it for a HAMer.

* After the change, the files named as *Upsampling*.c works for both up/down sampling. I could rename the files if needed.

Differential Revision: D8729635

Pulled By: ailzhang

fbshipit-source-id: a98dc5e1f587fce17606b5764db695366a6bb56b
2018-07-06 15:28:11 -07:00
766fa1fc96 Fix IDEEP CMakefile (#9217)
Summary:
The reason is that we are referencing `__ideep_looked_for` here: 77484d91db/cmake/Modules/FindMKL.cmake (L350)

This was first flushed out in https://github.com/pytorch/pytorch/pull/8105 and probably can help with https://github.com/pytorch/pytorch/issues/9024
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9217

Reviewed By: houseroad

Differential Revision: D8754491

Pulled By: yinghai

fbshipit-source-id: 70aecc2d60684b9ea522403dc98a0a1a2c3db7e6
2018-07-06 15:28:07 -07:00
af107c4d16 Fix shape inference bug (#9199)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9199

The input shapes are not logged correctly in production because `PerfNetObserver::Stop()` only gets called after the inference is done for the net and in the mobile models, it's common practice to reuse the blobs as much as possible to save memory. And the shapes of the blobs keep changing during inference. By the time you you query `InputTensorShapes()` in `PerfNetObserver::Stop()`, you only get the final shape of the blobs.

To fix this bug, I moved the 'InputTensorShapes()' query from `PerfNetObserver::Stop()` to `PerfOperatorObserver::Stop()`. The latter gets called at the end of operator->run() whereas `PerfNetObserver::Stop()` gets called at the end of net->run().

Also remove `PerfOperatorObserver::getAnalyticalCost()` since it's now done on the server side and no longer needed on mobile

Reviewed By: Maratyszcza

Differential Revision: D8743346

fbshipit-source-id: 5d2d0132e3f5e084be7d0173863e695e62a6b4a0
2018-07-06 15:15:17 -07:00
f87499a8f3 Modify the original PackSegments operator by adding "max_length" argument (#9048)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9048

max_length argument helps fix the shape of the output to be N * max_length * D, where N is the batch_size, D is the feature_dim.

Reviewed By: bddppq

Differential Revision: D8702782

fbshipit-source-id: e30555608fee1c4a61cc95922f4a71c7f54903af
2018-07-06 14:33:59 -07:00
4e5369349f Add FTRL Optimzier with Group Lasso regularizer (#9074)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9074

Implement an optimzier based on FTRL Optimzier which support Group
Lasso regularizer.

The relevant paper list for this optimizer:
1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf,
2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf

Differential Revision: D8623146

fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452
2018-07-06 13:41:00 -07:00
c0bfe2a6ed Clean up conversion registration
Summary:
[x] get registry working
[x] move all current ops to registry

Reviewed By: yinghai

Differential Revision: D8706115

fbshipit-source-id: 8dfce79039b57dea1c15e8e291cdd74f39766ade
2018-07-06 13:40:56 -07:00
da39c24971 Add GroupL1Norm regularizer (#9115)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9115

As desc

Reviewed By: hlu1

Differential Revision: D8718011

fbshipit-source-id: c9d750662064dd6e6362b6b13d9d0175e93e60e4
2018-07-06 13:26:09 -07:00
f1ce15b50c Move nccl scatter and gather to C++ (#9117)
Summary:
As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already.

I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.

I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO.

apaszke colesbury teng-li pietern
Closes https://github.com/pytorch/pytorch/pull/9117

Differential Revision: D8721729

Pulled By: goldsborough

fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
2018-07-06 11:10:33 -07:00
d863391871 nn::Module::as (#9149)
Summary:
Added a way to `dynamic_cast` an `nn::Module` and get a pointer to it. `nn::Module::is<T>` just checked if the return value of the `dynamic_cast` was nullptr, so I got rid of `is<T>` since it's equivalent to `as<T> != nullptr`(or just `as<T>` due to boolean conversion).

We're now at

```
if (auto* conv = module.as<nn::Conv2d>()) {
  conv->weight.data().normal_(0.0, 0.02);
} else if (auto* bn = module.as<nn::BatchNorm>()) {
  bn->weight.data().normal_(1.0, 0.02);
  bn->bias.data().fill_(0);
}
```

ezyang apaszke ebetica
Closes https://github.com/pytorch/pytorch/pull/9149

Differential Revision: D8735954

Pulled By: goldsborough

fbshipit-source-id: e2b8f6f0cea16a621f8bc0807a33cc7651d25154
2018-07-06 11:10:29 -07:00
9aded4351e Allow arbitrary namespaces for Symbols (#9018)
Summary:
Context: I am updating jit::FunctionSchema to use `Symbol name;` rather than `std::string name`. Sometimes the name refers to a builtin  thing like `prim::UnpackTuple`, sometimes to an aten operator like `aten::add`, and sometimes just to a raw string, like `my_method_foo` that really doesn't belong in any namespace and should be printed to the user in that form. For this last case, I want the ability to create a raw Symbol again, like was previously possible, that just represents an interned string. This PR enables that use, keeps the other functionality still possible, and simplifies interned_string's implementation a bit.

This changes how Symbol is implemented. Now the namespace of a symbol
is optional and the namespaces themselves are Symbols.
This allows Symbol to be used with arbitrary namespaces, and allows
you to use Symbol as an simple interned string using via fromQualString
and toQualString without :: in the string. This also simplifies the
implementation. Like with string conversion, builtin primitives go
through a fast path for namespace lookup while registered symbols require
holding a lock and reading an array entry to lookup the namespace.

Note: alexnet expect file update is from a previous commit. It doesn't run in CI because pytorch vision is not installed.
Closes https://github.com/pytorch/pytorch/pull/9018

Reviewed By: SsnL

Differential Revision: D8690449

Pulled By: zdevito

fbshipit-source-id: b65ee57704641d7294fe115c5470cf55d406458f
2018-07-06 10:11:15 -07:00
84884dc2d3 Allow passing '0' to ASAN/UBSAN flags (#9202)
Summary:
Similar to https://github.com/pytorch/pytorch/pull/9187, This PR makes setting the `PYTORCH_TEST_WITH_ASAN` and `PYTORCH_TEST_WITH_UBSAN` flags easier internally, by allowing the flags to be set to `0`.
Closes https://github.com/pytorch/pytorch/pull/9202

Differential Revision: D8745533

Pulled By: yf225

fbshipit-source-id: 6293f52f2e8b1c3ef150becfdc2dd7ded56d5d80
2018-07-06 08:40:37 -07:00
168a29f497 Create native wrappers around dimension reduction functions. (#9197)
Summary:
This is necessary for n-dimensional empty tensors, which have special native handling.
Closes https://github.com/pytorch/pytorch/pull/9197

Differential Revision: D8744083

Pulled By: gchanan

fbshipit-source-id: 3cc692a1d62cbeb169681b7c40e3df50e12953b7
2018-07-06 08:11:23 -07:00
1f1fb813a6 Use a static random_device in StorageSharing (#9080)
Summary:
I've been cleaning up my email notifications, and noticed that this PR used a stack-allocated `random_device`. This is generally a bad idea due to this sentence from the C++ reference (emphasis mine):

> `std::random_device` may be implemented in terms of an implementation-defined pseudo-random number engine if a non-deterministic source (e.g. a hardware device) is not available to the implementation. **In this case each `std::random_device` object may generate the same number sequence.**

If this is how this object is implemented, then this `rd()` call will give the same result at every call.

cc yf225
Closes https://github.com/pytorch/pytorch/pull/9080

Differential Revision: D8748342

Pulled By: soumith

fbshipit-source-id: 22987befee61ff7faacda5ecc10138c2ac5d26ff
2018-07-06 07:39:53 -07:00
eadc5071e8 Use torch.save in _StorageBase.__reduce__ (#9184)
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See #9168 for context
Closes https://github.com/pytorch/pytorch/pull/9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
2018-07-06 07:24:53 -07:00
7b25cbbef9 Test nn.Module on non-contiguous inputs (#9114)
Summary:
1. Let `ModuleTest` raise when they fail on non-contiguous inputs. Fix legacy modules.
2. Fix BN (both THNN and cuDNN) not working on non-contiguous inputs.
3. Fix CUDA EmbeddingBag not working on non-contiguous inputs. To prevent calling `.contiguous()` on in both `forward` and `backward`,
  a. prefix all current `embedding_bag*` functions with `_`, indicating that they require input to be contiguous (there is a check in each function).
  b. create `embedding_bag`, which makes input arguments `.contiguous()`, and calls `_embedding_bag`
3. Make many ATen `embedding*` functions to work on non-contiguous inputs so we don't need to call `input = input.contiguous()` in Python `nn.functional.embedding`.
4. Fix dense-sparse addition when the sparse input is not coalesced and indices or values tensor is not contiguous. This came up in the test cases of Embedding modules with `sparse=True`. Added tests.
5. Update `TensorUtils.cpp` to use `AT_*` macros.

Request:
review from cpuhrsch on the `Embedding*` changes.
review from ezyang on ATen sparse & BN changes.
Closes https://github.com/pytorch/pytorch/pull/9114

Differential Revision: D8717299

Pulled By: SsnL

fbshipit-source-id: 0acc6f1c9522b5b605361e75112c16bbe1e98527
2018-07-05 21:09:34 -07:00
a769fae91d Fix TestAutograd.test_pinverse not actually testing (#9192)
Summary:
cc vishwakftw

Also added a check if none of the input tensors in `gradcheck` have `requires_grad=True`.
Closes https://github.com/pytorch/pytorch/pull/9192

Differential Revision: D8739401

Pulled By: SsnL

fbshipit-source-id: 81bb3aa0b5c04eb209b137a4bd978e040e76cbcd
2018-07-05 18:55:00 -07:00
ff501c30af Turn on UBSAN in the OSS build (#8813)
Summary:
Copy of https://github.com/pytorch/pytorch/pull/8802
Closes https://github.com/pytorch/pytorch/pull/8813

Differential Revision: D8707364

Pulled By: yf225

fbshipit-source-id: bc201980b50e9fb44c42a17f898b50d3558fc417
2018-07-05 15:55:49 -07:00
21c420c32c Remove unused RowwiseArgMaxOp (#9119)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9119

Remove unused RowwiseArgMaxOp

Reviewed By: houseroad

Differential Revision: D8719826

fbshipit-source-id: 57d78c8b93bc94a4634d806c7c2041f8c18678a5
2018-07-05 15:25:28 -07:00
f45dfbccef Add support for ArgMax and ArgMin in C2 onnx backend and frontend (#9050)
Summary:
Pass the end to end test cases in https://github.com/onnx/onnx/pull/1049
Closes https://github.com/pytorch/pytorch/pull/9050

Reviewed By: hlu1

Differential Revision: D8703757

Pulled By: houseroad

fbshipit-source-id: 63308202e349dfc02d532e87f49495ba1aab085b
2018-07-05 14:26:08 -07:00
213540cd85 Add meshgrid to PyTorch (#8581)
Summary:
Part of this issue https://github.com/pytorch/pytorch/issues/7580
Closes https://github.com/pytorch/pytorch/pull/8581

Differential Revision: D8661660

Pulled By: soumith

fbshipit-source-id: 4a72fb5152ed6eb4d57f14de691bf09a2a2e5b0c
2018-07-05 11:25:27 -07:00
1c9073b43a Allow passing '0' to NO_MULTIPROCESSING_SPAWN (#9187)
Summary:
This PR makes setting the `NO_MULTIPROCESSING_SPAWN` easier internally, by allowing the flag to be set to `0`.
Closes https://github.com/pytorch/pytorch/pull/9187

Differential Revision: D8736206

Pulled By: yf225

fbshipit-source-id: b8a34cb9a747b13bc9428777a3ed766ce441cfe1
2018-07-05 11:10:46 -07:00
14cbd9adb8 Implement torch.pinverse : Pseudo-inverse (#9052)
Summary:
1. Used SVD to compute.
2. Tests in test_autograd, test_cuda and test_torch
3. Doc strings in _torch_docs.py and _tensor_docs.py

Closes #6187
Closes https://github.com/pytorch/pytorch/pull/9052

Reviewed By: soumith

Differential Revision: D8714628

Pulled By: SsnL

fbshipit-source-id: 7e006c9d138b9f49e703bd0ffdabe6253be78dd9
2018-07-05 09:11:24 -07:00
f6027bb15d Install hpp headers for CPP Extensions (#9182)
Summary:
With the Cppzation of a few files in `TH`/`THC`, the CPP extensions got broken whenever the user uses feature from `THC` in their files, when pytorch is installed via `python setup.py install`.

This addresses issues such as
```
/home/me/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC/THCDeviceTensorUtils.cuh:5:25: fatal error: THCTensor.hpp: No such file or directory
```
Closes https://github.com/pytorch/pytorch/pull/9182

Reviewed By: soumith

Differential Revision: D8734581

Pulled By: fmassa

fbshipit-source-id: 2a1138f208592eaccb01fcdb805a6b369d7a497a
2018-07-05 07:55:25 -07:00
08daed40f7 Fix bug in flip() (#9156)
Summary:
Closes #9147
Added a test to prevent regression in test_torch
Added entries in docs

cc ezyang weiyangfb
Closes https://github.com/pytorch/pytorch/pull/9156

Differential Revision: D8732095

Pulled By: soumith

fbshipit-source-id: 7a6892853cfc0ccb0142b4fd25015818849adf61
2018-07-04 07:24:01 -07:00
4b2b690792 Install THC/THCGeneral.hpp (#9159)
Summary:
This file was added in #9107 but wasn't installed. The libraries in
./torch/lib use the headers from Caffe2/ATen from their temporary
install path at torch/lib/tmp_install, and c10d was not able to find
THC/THCGeneral.hpp before this fix.
Closes https://github.com/pytorch/pytorch/pull/9159

Reviewed By: Yangqing

Differential Revision: D8731107

Pulled By: pietern

fbshipit-source-id: d6009f6f6e8e6e0f37dea24cc4c3570736943ab1
2018-07-03 21:40:44 -07:00
49f88ac956 Add grid lines for activation images, fixes #9130 (#9134)
Summary:
1. Add dashed light blue line for asymptotes.
2. RReLU was missing the activation image.
3. make clean in docs will remove the activation images too.

Sample image:
![image](https://user-images.githubusercontent.com/23639302/42224142-5d66bd0a-7ea7-11e8-8b0a-26918df12f7c.png)
Closes https://github.com/pytorch/pytorch/pull/9134

Differential Revision: D8726880

Pulled By: ezyang

fbshipit-source-id: 35f00ee08a34864ec15ffd6228097a9efbc8dd62
2018-07-03 19:10:00 -07:00
e3dbdb2a17 Fix the comments: code and comments dimensions mis-match (#9070)
Summary:
This will resolve the code and comments mis-match issue.
Closes https://github.com/pytorch/pytorch/pull/9070

Differential Revision: D8712261

Pulled By: ezyang

fbshipit-source-id: a8a7d8af890a41ec246e11c2a62b0bde297be9c1
2018-07-03 14:39:57 -07:00
b479494ed4 loss plugin: Fix indexing into a scalar (#9143)
Summary:
The loss plugin was using the old-style loss[0] access, which in PyTorch 0.4 and
later is an attempt to index into a scalar, generating a warning.
Replaced that with loss.item().

This fixes
https://github.com/pytorch/pytorch/issues/9142
Closes https://github.com/pytorch/pytorch/pull/9143

Differential Revision: D8726403

Pulled By: ezyang

fbshipit-source-id: 6c496b140a74d22c8423f511db901b18615fd6fa
2018-07-03 14:25:44 -07:00
b432837a9d Add some missing error checks in sparse. (#9140)
Summary:
- There were missing error messages for AT_CHECK in SparseTensorImpl::set_indices_and_values
- We have to check that the backends of all our inputs line up,
  since native does not do it for us.
- Some math operations were missing shape tests.

Fixes #9110

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Closes https://github.com/pytorch/pytorch/pull/9140

Differential Revision: D8724349

Pulled By: ezyang

fbshipit-source-id: 3c75104187aca97cbe92bb0ec24f6ded07b2c3d6
2018-07-03 13:11:12 -07:00
f17b9e4cde Fix boolean indexing. (#8920)
Summary:
Booleaning indexing was special cased to handle a single boolean value, but didn't generally work given multiple booleans.
This PR unifies the behavior with slicing.  Note that only 'True' and torch.tensor(True) behave like NumPy due to the lack of n-dimensional empty tensors.
The corresponding tests for false values have been added, but are guarded behind a flag until we add n-dimensional empty tensors.
Closes https://github.com/pytorch/pytorch/pull/8920

Reviewed By: ezyang

Differential Revision: D8661876

Pulled By: gchanan

fbshipit-source-id: 0dc8a45a303aa41f729d04ab8908cfaf2e3ce3d7
2018-07-03 10:24:12 -07:00
4f89777d29 Removing extraneous main function to fix buck test detection (#9121)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9121

This main function causes 'buck test caffe2_test_cpu' to run 0 tests

Reviewed By: orionr

Differential Revision: D8719343

fbshipit-source-id: dc1cf76b0355637eaae193be2159f5746873b9f9
2018-07-03 09:25:12 -07:00
e09d993d8b Move easy THStorage/THCStorage functions out of generic (#9136)
Summary:
Some functions are exactly implemented in THStorage_; in that case,
we called those functions directly.

Stacked on #9135
Closes https://github.com/pytorch/pytorch/pull/9136

Reviewed By: Yangqing

Differential Revision: D8723998

Pulled By: ezyang

fbshipit-source-id: 653d23a5e1db4b9bdda50641fa97730894cc8ed5
2018-07-03 09:11:51 -07:00
9b0cece9b0 Enable the general usage of _download_url_to_file (#9090)
Summary:
A requirement for the fix on https://github.com/pytorch/examples/issues/378.
Closes https://github.com/pytorch/pytorch/pull/9090

Reviewed By: goldsborough

Differential Revision: D8712254

Pulled By: ezyang

fbshipit-source-id: b28765f24d891890e9d88757ee4ec704e38e6af7
2018-07-02 19:55:39 -07:00
97b9712aed Create Sequential::extend (#9116)
Summary:
There is no way to concatenate two `Sequential`s in Python, but it's also easier to do in an immutable fashion by just writing `Sequential(first.modules() + second.modules())`. Concatenating vectors isn't as easy in C++, so I think it's fair to save users some for loops by giving them `Sequential::extend()`.

apaszke ebetica ezyang

CC jamespinkerton
Closes https://github.com/pytorch/pytorch/pull/9116

Reviewed By: ezyang

Differential Revision: D8719630

Pulled By: goldsborough

fbshipit-source-id: 840d7ac70755350e6202b493c531e30ecbb6546f
2018-07-02 19:42:03 -07:00
16570ef0d5 Update onnx submodule to include the protobuf fix for windows
Summary: Closes https://github.com/pytorch/pytorch/pull/9113

Reviewed By: houseroad

Differential Revision: D8717259

Pulled By: bddppq

fbshipit-source-id: c99a4390b764707affea7db765abef789230f497
2018-07-02 19:42:01 -07:00
21c786071b update nn loss tests to use new reduction arg (#9118)
Summary:
The tests were using the old args, which caused them to emit a lot of deprecation warnings.

closes #9103.

Reviewed By: ezyang

Differential Revision: D8720581

Pulled By: li-roy

fbshipit-source-id: 3b79527f6fe862fb48b99a6394e8d7b89fc7a8c8
2018-07-02 19:41:57 -07:00
4d57a1750c Unify THStorage and THCStorage structs. (#9107)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9107

Some details about how this was done:

- For now, the allocators for CPU and CUDA are different (unifying
  the allocators is a bigger change to make, I'll contribute this in
  a later patch).  To smooth this over, the allocator field now
  stores a void* instead of THAllocator* or THCDeviceAllocator*; to
  make this clear the field is renamed to allocatorVoidPtr.

- Some THStorage functions which were generated per-scalar are now
  generalized, and thus moved out of the generic/ library.  This way
  they can be called directly from a non-code-generated at::Storage

- THCState is moved into a C++ header.  This is actually not really
  related to this particular diff, but I'll need it soon to replace
  THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
  mention it in a C header file.)

- THPPointer needs to be adjusted, since there is no more type refinement
  between THStorage/THCStorage for it to template match over.  This
  is a little tricky, because I can't refer to THCStorage_free unless
  we actually compile with CUDA.  So there's two copies of the function
  now: one for the CPU build, one for the CUDA build.  If we ever split
  CUDA/non-CUDA Python builds, you will have to indirect this through some
  dynamic dispatch.

I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Closes https://github.com/pytorch/pytorch/pull/9087

Reviewed By: orionr

Differential Revision: D8712072

Pulled By: ezyang

fbshipit-source-id: c6e1ea236cd1df017b42a7fffb2dbff20d50a284
2018-07-02 17:09:52 -07:00
5d474e1812 Make all module members public (#9111)
Summary:
Having circulated the C++ API a bit, I found that it would make it easier for folks to access module parameters directly than through the `parameters()` map. So here I make all variables/submodules and also the configuration options for every module public.

For RNNs, I also updated the names of parameters to match PyTorch, e.g. `hhw` -> `w_hh`. This should make it easier to transition from Python.

apaszke ebetica
Closes https://github.com/pytorch/pytorch/pull/9111

Differential Revision: D8717112

Pulled By: goldsborough

fbshipit-source-id: 3d36d5e161f7a86f44db7136c9c2fa53067abe1c
2018-07-02 16:09:57 -07:00
cb1bfe91af Deprecated several functions at torch.nn.functional (#8748)
Summary:
1. fixes #6245
2. deprecated tanh, sigmoid
Closes https://github.com/pytorch/pytorch/pull/8748

Differential Revision: D8697975

Pulled By: weiyangfb

fbshipit-source-id: f30714aa0611a1fe870040692f3dbcc8238aece9
2018-07-02 15:54:46 -07:00
50392cc554 Store OperatorDef by copy (#9108)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9108

OperatorDef ownership  was given to the net in the past, we no longer
want to do that

Reviewed By: pjh5

Differential Revision: D8705347

fbshipit-source-id: 34976de202a7a7a71b935dd13c1bc8e9c73552e0
2018-07-02 15:42:18 -07:00
b79e8f79d8 Make SumElementsGradient use copy (#9039)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9039

att

Reviewed By: ezyang

Differential Revision: D8696455

fbshipit-source-id: 945e49a4c294fa39f847576d44ca0e6a32ecaf18
2018-07-02 13:25:12 -07:00
e977485449 detach spectral norm calculated weight in eval mode (#9020)
Summary:
As we left weight to be the last calculated weight in eval mode, we need to detach it from the computation in order to facilitate using backward.
The typical use case is in GANs when the discriminator has spectral norm, is in eval mode and we want to backprop through the discriminator to get weight gradients for the generator.
Closes https://github.com/pytorch/pytorch/pull/9020

Reviewed By: ezyang

Differential Revision: D8694054

Pulled By: SsnL

fbshipit-source-id: 09ee5843687cac3ed4c40759ac577a14c5371730
2018-07-02 10:39:47 -07:00
553c41f082 Adds serialization path (#9035)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9035

This diff builds on the structure in the stacked diff to add serialization/deserialization. It supports the old format and a new suggested format.

Reviewed By: ilia-cher

Differential Revision: D8415115

fbshipit-source-id: acaacce2b015f4c6ac0ae22625455290a3f30262
2018-07-02 09:09:39 -07:00
623ae0c07c Fix loading 0.4 BN checkpoints (#9004)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/8481
Closes https://github.com/pytorch/pytorch/pull/9004

Reviewed By: soumith

Differential Revision: D8684017

Pulled By: SsnL

fbshipit-source-id: 57820ad5f6b60795358c9447409a364a93ffa1d9
2018-07-01 22:24:21 -07:00
179807a8c7 Fix MAGMA svd and eig (#9082)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/9079

There is room for speed-up for both functions (see https://github.com/pytorch/pytorch/issues/9083), but let's get this in to unblock #9052 .
Closes https://github.com/pytorch/pytorch/pull/9082

Reviewed By: ezyang

Differential Revision: D8711687

Pulled By: SsnL

fbshipit-source-id: f043a9bf55cb6aec5126c3331d35761f7aa3f8e3
2018-07-01 22:24:17 -07:00
474fdd7e2d minor pybind for jit (#8890)
Summary:
add two small bindings to recently added attributes.

Also want to leave a reference gist here: https://gist.github.com/soumith/8102ef39530bac09070912b1a5401d0f

It showcases:

- traced a module
- symbolically differentiated the forward graph, to get a forward, backward graph
- executed the subsequent forward + backward graphs correctly
- compared the jit vs non-jit results
Closes https://github.com/pytorch/pytorch/pull/8890

Reviewed By: ezyang

Differential Revision: D8677663

Pulled By: soumith

fbshipit-source-id: a29919c05baad997cd7fb7df718f933a83035118
2018-07-01 21:39:29 -07:00
8364470e5c fix expty batch for softmax (#9075)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9075

as title

Reviewed By: QueryConnectionException

Differential Revision: D8710616

fbshipit-source-id: ca505e1a733cc24db9e2ab83a5395c64fa8360c4
2018-07-01 16:40:14 -07:00
04f2708265 Fix build script for Windows (#9060)
Summary:
1. Escape quotes
2. Use file exist logic to determine build success/failure
Closes https://github.com/pytorch/pytorch/pull/9060

Differential Revision: D8707290

Pulled By: soumith

fbshipit-source-id: a34265f46725eaaf9489bc38546200aeae75e8a9
2018-07-01 07:10:06 -07:00
c61f0217a5 combine size_average and reduce args in loss functions (#8018)
Summary:
closes #7929
Closes https://github.com/pytorch/pytorch/pull/8018

Differential Revision: D8682540

Pulled By: li-roy

fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936
2018-07-01 05:39:00 -07:00
03e7953a98 Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9072

Use FixedDivisor in Reduce and Broadcast CUDA kernels

Reviewed By: houseroad

Differential Revision: D8710243

fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb
2018-07-01 00:25:34 -07:00
90fd4df695 Add flag for disabling tests with multiprocessing spawn start method (#9061)
Summary:
This will resolve some of the timeout issues in CPU and GPU tests internally.
Closes https://github.com/pytorch/pytorch/pull/9061

Reviewed By: ezyang

Differential Revision: D8707471

Pulled By: yf225

fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246
2018-06-30 14:39:11 -07:00
2c6c53f5ce Ensure that domain starts with domain_prefix before extracting substring (#9053)
Summary:
Fixes #9049.
When provided with a domain string that lacks proper prefix, i.e. `org.pytorch.`, an exception is thrown.
Closes https://github.com/pytorch/pytorch/pull/9053

Differential Revision: D8708264

Pulled By: ezyang

fbshipit-source-id: e2593d8d36a17d3bb26fc0b239a61b84f1c38ecb
2018-06-30 10:39:40 -07:00
0515664c42 Make _C depend on csrc-no-python (#9057)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9057

Make the `_C` target depend on the `csrc-no-python` target. Also removes the `csrc` target and the with-python version of autogradpp (which is not used). Let me know if we should pick better names here.

I also ran into a nasty linker issue with only one symbol being undefined. It turns out had been given inline linkage in the `.cpp` file, which I believe is an error.

Reviewed By: orionr

Differential Revision: D8705750

fbshipit-source-id: 8de083e371dbf5e9f12c15572d88e1c595dfa087
2018-06-29 20:39:24 -07:00
b07ea04e23 empty batch for spatialBN (#8933)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8933

spatialBN implementation cannot deal with empty batch, this diff tries to enable zero batch setting:

during training, when batch_size = 0:
in forward, output's saved_mean and saved_var are zeros.
in backward, the gradient for SCALE_GRAD and BIAS_GRAD are zeros.

Reviewed By: pjh5

Differential Revision: D8644699

fbshipit-source-id: 599ea687329d68699c987e05f56f409f4e729d1c
2018-06-29 18:40:41 -07:00
d7487bfe9e Speed-up multidim sum (#8992)
Summary:
1. Instead of using non `_out` variant, we allocate a buffer and use `_out` variant to write the intermediate results into the buffer.
2. Reduce dimensions in order of decreasing sizes.

Benchmark:
Sum a randn tensor of shape `[200, 1, 30, 40, 20, 1, 50]` along dimensions `[4, 6, 3, 0, 2, 5]`. Averaged across 1000 times:
```
before patch:
CPU: 0.0441 s
CUDA: 0.0273 s

after patch:
CPU: 0.0234 s
CUDA: 0.0047 s
```
Closes https://github.com/pytorch/pytorch/pull/8992

Differential Revision: D8681069

Pulled By: SsnL

fbshipit-source-id: 2c5d5af5c5a284f2e945181f2b24ee8c78becd50
2018-06-29 18:40:39 -07:00
9ce15173fb Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012)
Summary:
The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class.
The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set.

To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API.

I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN.

ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/9012

Reviewed By: pjh5

Differential Revision: D8689786

Pulled By: goldsborough

fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c
2018-06-29 17:25:23 -07:00
863754c722 Update the ONNX op coverage in C2
Summary: Closes https://github.com/pytorch/pytorch/pull/9051

Reviewed By: pjh5

Differential Revision: D8704583

Pulled By: houseroad

fbshipit-source-id: 186e8b62378ab4f7cdef5fa77dc08c6b9ddc9cc0
2018-06-29 17:25:19 -07:00
d793473e60 add note to avoid memory surge on GPU (#9019)
Summary:
Addresses #7415 . Adding a note first, will do the API change if there's a need in the future.
Closes https://github.com/pytorch/pytorch/pull/9019

Differential Revision: D8694056

Pulled By: ailzhang

fbshipit-source-id: 0b6fa43fa62ac55deff3b3b099d1bc9fee74a5f9
2018-06-29 16:55:17 -07:00
67b21117b7 Add BatchTensor class (#8922)
Summary:
Add BatchTensor class
- construct from data, mask, dims or construct from list of tensors
- can return a list of tensors from an BatchTensor class

next step: do IR level transformation and operators
Closes https://github.com/pytorch/pytorch/pull/8922

Differential Revision: D8668986

Pulled By: ChunliF

fbshipit-source-id: 8b24d2a9f46a3b42dbb397e99e9e059dfb2b326e
2018-06-29 15:57:27 -07:00
3a71cf2e54 Disable verbose printing for time sequence prediction test
Summary: Closes https://github.com/pytorch/pytorch/pull/9040

Reviewed By: soumith, wanchaol

Differential Revision: D8697870

Pulled By: jamesr66a

fbshipit-source-id: 212fe14aaf9c60c4c9c6d383b202395b1d0ec680
2018-06-29 12:40:18 -07:00
7a1081b310 Re-enable passing operator-level tests (#9044)
Summary:
Just tried these and they work now
Closes https://github.com/pytorch/pytorch/pull/9044

Reviewed By: soumith

Differential Revision: D8698819

Pulled By: jamesr66a

fbshipit-source-id: 1d5574de1819aa31fc36ad245186c7aa68587178
2018-06-29 12:25:28 -07:00
b3fe200704 Fix TestJit.test_alexnet expect file
Summary: Closes https://github.com/pytorch/pytorch/pull/9041

Reviewed By: soumith

Differential Revision: D8698147

Pulled By: jamesr66a

fbshipit-source-id: 63eb1bc96562b6f972aeba8748454efb9c889d5c
2018-06-29 12:25:25 -07:00
f6cfd83a80 Find unused port for test dynamically (#9037)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9037

Fixes flaky test failures due to port in use.

Reviewed By: soumith

Differential Revision: D8696779

fbshipit-source-id: a05412d1eb1dcb9a4b35023dead371aa33d62c39
2018-06-29 12:25:23 -07:00
b75490414c Bump up the C2 onnx frontend opset to 8 (#9006)
Summary:
Now ONNX master has bump up to opset 8.
Closes https://github.com/pytorch/pytorch/pull/9006

Reviewed By: yinghai

Differential Revision: D8685417

Pulled By: houseroad

fbshipit-source-id: f0c0a3682417b8803a856e232c2740cf3e68e554
2018-06-29 11:56:11 -07:00
4efbd2e22c Improve DataLoader worker fail error message (#9007)
Summary:
Tell people to run with num_workers=0 when DataLoader worker failed
Closes https://github.com/pytorch/pytorch/pull/9007

Differential Revision: D8686005

Pulled By: SsnL

fbshipit-source-id: bf872267f609c7b86e943061caab953149507bfe
2018-06-29 11:09:55 -07:00
a2bf55f9eb Fix select backward when wrap dim (#9033)
Summary:
Previous backward was broken when `index=-1` because slicing `[-1:0]` gives empty tensor/list/array.

Added a test.

cc goldsborough
Closes https://github.com/pytorch/pytorch/pull/9033

Differential Revision: D8694300

Pulled By: SsnL

fbshipit-source-id: 8377b043896f8d0b1da173cc0077ace0bea5e862
2018-06-29 10:40:13 -07:00
2507e273dc Fix CUDA 8 for Windows (#9023)
Summary:
Fix missing functions for MSVC 2015
Inspired by https://github.com/tensorflow/tensorflow/pull/13525
Closes https://github.com/pytorch/pytorch/pull/9023

Reviewed By: soumith

Differential Revision: D8694046

Pulled By: ezyang

fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5
2018-06-29 09:40:48 -07:00
c2a89b69b9 Support to ONNXIFI op (#8749)
Summary:
This PR adds basic support to ONNXIFI op.
Closes https://github.com/pytorch/pytorch/pull/8749

Reviewed By: Maratyszcza

Differential Revision: D8665739

Pulled By: yinghai

fbshipit-source-id: 961916f9e1a4a26390b73c4b648d177883143a22
2018-06-29 09:10:26 -07:00
37e526e1a8 Better print of nn Containers (#8939)
Summary:
Fix https://github.com/pytorch/pytorch/issues/8900

Waiting on https://github.com/pytorch/pytorch/pull/8463

1. Remove extra Line
2. ...
Closes https://github.com/pytorch/pytorch/pull/8939

Reviewed By: soumith

Differential Revision: D8687730

Pulled By: ezyang

fbshipit-source-id: 81c57a03683875704d537cb4585b11838f70df56
2018-06-29 08:24:09 -07:00
512c49e831 Correct link flag order for GNU ld in utils.cpp_extension.load (#9021)
Summary:
Any flags linking libraries only take effect on inputs preceding them,
so we have to call `$cxx $in $ldflags -o $out` instead of the other way
around.

This was probably not detected so far since the torch libraries are
already loaded when loading JIT-compiled extensions, so this only has an
effect on third-party libraries.

This also matches our behavior on windows.
Closes https://github.com/pytorch/pytorch/pull/9021

Reviewed By: soumith

Differential Revision: D8694049

Pulled By: ezyang

fbshipit-source-id: e35745fc3b89bf39c14f07ce90d6bd18e6a3d7cc
2018-06-29 08:24:07 -07:00
6a1e801071 add second variant to Tensor.add, Tensor.add_ docstring (fixes: #8690) (#9027)
Summary:
fixes: #8690
Closes https://github.com/pytorch/pytorch/pull/9027

Reviewed By: soumith

Differential Revision: D8694042

Pulled By: ezyang

fbshipit-source-id: bc3b1112b41f959231854366cdcf9292b3699779
2018-06-29 08:24:06 -07:00
b795620442 Fix x.pow(0) gradient when x contains 0 (#8945)
Summary:
This closes https://github.com/pytorch/pytorch/issues/8940 .
Closes https://github.com/pytorch/pytorch/pull/8945

Differential Revision: D8668853

Pulled By: ezyang

fbshipit-source-id: 80a629352ee2f506c38a05647b769281579a5af7
2018-06-29 06:53:42 -07:00
00b5d397ae Fix resolution callback for @script_method (#8912)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/8715. This was peeking too few frames up when we instantiate the callback
Closes https://github.com/pytorch/pytorch/pull/8912

Reviewed By: ezyang

Differential Revision: D8684972

Pulled By: jamesr66a

fbshipit-source-id: 11dbb919ae7273f92cbe25fe21f7946b9fa28aeb
2018-06-28 22:56:17 -07:00
4643269eb5 Document get_device, fixes #8857 (#8859)
Differential Revision: D8677690

Pulled By: ezyang

fbshipit-source-id: 0167672d1d2659d9fc7d68530760639ba35ed7d8
2018-06-28 22:11:08 -07:00
bf65df5310 Get rid of possible ODR violation with const char*. (#8962)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Closes https://github.com/pytorch/pytorch/pull/8962

Differential Revision: D8668580

Pulled By: ezyang

fbshipit-source-id: 466a5940f175f1f339dc826a1ab32bf3e42e64fd
2018-06-28 17:53:55 -07:00
5b7951057d Distributed Data Parallel Module Implementation (#8584)
Summary:
This is an initial implementation of Distributed Data Parallel module for c10d GLOO and NCCL backend.

Have done performance testing and made sure that both single GPU / process and multi-GPU / process are able to overlap communication with BW computation

The idea is, DDP will bucket parameters and do all reduce in the reverse order of the bucket. Since all C10D ops are async ops, no more dedicated thread is needed and we simply queue the all-reduce kernels once the bucket is ready following the deterministic reduction order.

Tested with 8 nodes 64 GPUs, ResNet 50, hit the required accuracy within 90 epochs
Closes https://github.com/pytorch/pytorch/pull/8584

Reviewed By: goldsborough

Differential Revision: D8678696

Pulled By: teng-li

fbshipit-source-id: 440341b804befc6762e92acece2759ba47157cea
2018-06-28 17:25:40 -07:00
30549a1293 Deal with more threads than necessary (#8961)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/8949
Closes https://github.com/pytorch/pytorch/pull/8961

Reviewed By: colesbury

Differential Revision: D8669829

Pulled By: cpuhrsch

fbshipit-source-id: 368f76a2c6602a62fb7609d404af9753c87dc605
2018-06-28 16:44:23 -07:00
2e23bc1a20 Switch to emitting ScriptModule for scripted and traced functions (#8876)
Summary:
Solves https://github.com/pytorch/pytorch/issues/8716 and closes https://github.com/pytorch/pytorch/issues/8867

This makes it so that all of {script, traced} {module, function} create ScriptModules and implements proper inlining between them. This also greatly simplifies things and makes clear that tracing is a way to convert regular Python into a ScriptModule
Closes https://github.com/pytorch/pytorch/pull/8876

Differential Revision: D8675996

Pulled By: jamesr66a

fbshipit-source-id: 3b12ad4b758324f558074c27c1f1a9fb616b170a
2018-06-28 16:44:21 -07:00
0bd9e96b08 Enable script for time-sequence prediction (#8862)
Summary:
Enable script for the time-sequence prediction, did bunch of hacks to make the script mode work, and couple of issues discovered while enabling the time-sequence prediction, all noted in #8452,

Shall we merge this PR and iteratively fix those issues thereafter?
Closes https://github.com/pytorch/pytorch/pull/8862

Differential Revision: D8677683

Pulled By: wanchaol

fbshipit-source-id: 02319cd56c87de523be898f0e6c541dd15e57cac
2018-06-28 16:10:10 -07:00
f0772c0ab2 Replace max_pool with max_pool_with_indices (#8946)
Summary:
Re-push from https://github.com/pytorch/pytorch/pull/8892
Closes https://github.com/pytorch/pytorch/pull/8946

Differential Revision: D8666862

Pulled By: goldsborough

fbshipit-source-id: 44cd3d63d347316818a7b0f5f89fce8ff7486736
2018-06-28 16:10:08 -07:00
66465f1e17 Create nn::Module::is (#8970)
Summary:
When initializing weights for my C++ model, I had to write

```cpp
void initialize_weights(nn::Module& module) {
  if (module.name().find("Conv2d") != std::string::npos) {
    module.parameters()["weight"].data().normal_(0.0, 0.02);
  } else if (module.name().find("BatchNorm") != std::string::npos) {
    auto parameters = module.parameters();
    parameters["weight"].data().normal_(1.0, 0.02);
    parameters["bias"].data().fill_(0);
  }
}
```

The string-based module determination is not very nice, and not very C++-y. So I created `nn::Module::is<T>` which does a `dynamic_cast` inside. It also handles the `ModuleHolder` vs. `Module` distinction.

It now becomes

```cpp
if (module.is<nn::Conv2d>()) {
    module.parameters()["weight"].data().normal_(0.0, 0.02);
  } else if (module.is<nn::BatchNorm>()) {
    auto parameters = module.parameters();
    parameters["weight"].data().normal_(1.0, 0.02);
    parameters["bias"].data().fill_(0);
  }
```

ebetica ezyang apaszke
Closes https://github.com/pytorch/pytorch/pull/8970

Differential Revision: D8677476

Pulled By: goldsborough

fbshipit-source-id: 053294e19b6a58cce868167596c89639f7de91c2
2018-06-28 16:10:04 -07:00
15a75208ee Use std::random_device for generating storage handle (#8971)
Summary:
Currently the `test_RNG_after_pickle` in the PR would fail because pickling a tensor changes the RNG state. This PR aims to fix it.
Closes https://github.com/pytorch/pytorch/pull/8971

Reviewed By: ezyang

Differential Revision: D8677474

Pulled By: yf225

fbshipit-source-id: 1713d9611699ad288b66d92dbb29ce9feb34b8cf
2018-06-28 15:10:27 -07:00
838fdd6f99 Add Cube and Cbrt Ops (#8991)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8991

Add Cube and Cbrt Ops

Reviewed By: houseroad

Differential Revision: D8678848

fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930
2018-06-28 14:55:30 -07:00
61ca0ba222 Add log1p for sparse tensor (#8969)
Summary:
- fixes log1p at #8853
- added log1p of sparse tensor in ATen
- make log1p of sparse tensor non-differentiable and raise error, because local derivate of log1p for zero element is 1 / (0 + 1) = 1 and make tensor dense
Closes https://github.com/pytorch/pytorch/pull/8969

Reviewed By: ezyang

Differential Revision: D8677491

fbshipit-source-id: 8363a613519de4bc75eda087ccd20a3eb2d18126
2018-06-28 13:10:11 -07:00
8d384600b8 Add ShapeTypeInference for Conditional operator (#8924)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8924

Closes https://github.com/pytorch/pytorch/pull/8915

As desc

Reviewed By: ezyang

Differential Revision: D8649582

fbshipit-source-id: d08a456b9861dd7edd19ed18e16d4778b4240c90
2018-06-28 12:10:24 -07:00
7310229426 Fix TestCollectEnv flakiness (#8983)
Summary:
The problem was a bad regex; the version hash match used to match 6
wildcards. This PR changes it to match \w+, which is sufficient for the
test because the version hash is always followed by either whitespace or
a right-paren.

Fixes #8981
Closes https://github.com/pytorch/pytorch/pull/8983

Differential Revision: D8677771

Pulled By: zou3519

fbshipit-source-id: dfdde98669bcd682335145cba98c82530a815afa
2018-06-28 11:45:37 -07:00
93cc7d1923 Add in_place test for binary ops
Summary: Closes https://github.com/pytorch/pytorch/pull/8973

Reviewed By: houseroad

Differential Revision: D8674216

Pulled By: BIT-silence

fbshipit-source-id: bde1ff7b47dbc8a48d1ff72b345c767af698a09b
2018-06-28 11:45:35 -07:00
ccc14071f4 Fix Module::zero_grad (#8964)
Summary:
`nn::Module::zero_grad` did not respect undefined `grad()` variables. This is fixed (the code now replicates PyTorch).

ebetica ezyang apaszke
Closes https://github.com/pytorch/pytorch/pull/8964

Reviewed By: ezyang

Differential Revision: D8677529

Pulled By: goldsborough

fbshipit-source-id: afdc4ba00dbf5012c37d1f794c731937ee5e422e
2018-06-28 10:26:52 -07:00
63233f98ad Bump up opset version to 7 in Caffe2 ONNX exporter (#8854)
Summary:
Will bump up to opset 8 in another PR to match the current opset version.

Already tested through generating the models in current model zoo.
Closes https://github.com/pytorch/pytorch/pull/8854

Reviewed By: ezyang

Differential Revision: D8666437

Pulled By: houseroad

fbshipit-source-id: feffdf704dd3136aa59c0f1ff1830c14d1bd20aa
2018-06-28 07:39:02 -07:00
148088a681 Convert at::Tensor to torch::Tensor in AnyModule (#8968)
Summary:
Operations on `Variable`s (or `torch::Tensor`) usually return `at::Tensor`. This is usually fine, but the `AnyModule` used in the implementation of `torch::Sequential` is very picky about types, and does not understand implicit conversions like this. This means that `sequential.forward(at_tensor_that_is_actually_a_variable)` will fail unless you wrap `at_tensor_that_is_actually_a_variable` with `torch::Tensor`.

This PR adds a special case to `AnyModule` that will convert an `at::Tensor` to `torch::Tensor` when the tensor is really a variable, and else just pass the `at::Tensor`. This is a nice little usability improvement for the often-used `Sequential` class.

ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/8968

Reviewed By: ezyang

Differential Revision: D8670407

Pulled By: goldsborough

fbshipit-source-id: 3635ed6ed28238f3900ce4a876d07f1b11713831
2018-06-28 06:40:48 -07:00
77484d91db Add AT_WARN to issue warnings from ATen (#8967)
Summary:
Use AT_WARN from python_anomaly_mode instead of printing to stdout.
Closes https://github.com/pytorch/pytorch/pull/8967

Reviewed By: ezyang

Differential Revision: D8670654

Pulled By: colesbury

fbshipit-source-id: 3f7aee8ea06914d7d4381feec086e95f0b194752
2018-06-27 21:24:39 -07:00
c3b499227d Avoid iomp/gomp clash when building IDEEP ops (#8955)
Summary:
This PR does 3 things
- Reorder the search order of `intel_lp64` and `gf_lp64` as the first one is more essential and should have high priority.
- Avoid repetitive searching of MKL libraries in `ideep` and `mkldnn` submodule if we already found those in `FindMKL`
- Avoid adding more MKL dependencies to IDEEP if MKL is also found.

TODO: provide an option for user to chose iomp or gomp.
Closes https://github.com/pytorch/pytorch/pull/8955

Reviewed By: bddppq

Differential Revision: D8666960

Pulled By: yinghai

fbshipit-source-id: 669d3142204a8b47c19a900444246fc44a139012
2018-06-27 21:24:36 -07:00
ccd3e2c03d Skip operator tests in rocm CI jobs (#8720)
Summary:
disable operator tests for now until we have enough rocm workers in CI
Closes https://github.com/pytorch/pytorch/pull/8720

Reviewed By: ezyang

Differential Revision: D8654871

Pulled By: bddppq

fbshipit-source-id: ff2504d6a7182f85f7cc15618f2df8e512447fa8
2018-06-27 20:39:19 -07:00
059ccb62c1 bump up onnx version (#8975)
Summary:
To include the change in https://github.com/onnx/onnx/pull/1151
Closes https://github.com/pytorch/pytorch/pull/8975

Reviewed By: bddppq

Differential Revision: D8673552

Pulled By: yinghai

fbshipit-source-id: f55c270ef869bd2e19fdabbdf906a6ae12129791
2018-06-27 20:24:30 -07:00
346de2535d Workaround lack of 0-dim support in ideep (#8959)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8959

MKL-DNN doesn't have support to  0-dim tensor. As a workaround, we produce CPUTensor instead of Ideep tensor in the fallback ops. And for those tensors, we don't need Ideep copy op anymore.

Reviewed By: viswanathgs

Differential Revision: D8665168

fbshipit-source-id: 59678de2c5aed8c691ab5caaadede6d6c000dd7b
2018-06-27 20:24:28 -07:00
03d0a70a4d Set random seed at the start of C++ tests (#8903)
Summary:
Sets the random seed at the start of C++ tests so that everything is super deterministic.

I made sure we only generate random values from torch instead of `std::`, so that this seed always applies. I.e. I do:

```
torch::randint(2, {2}, at::kInt64)
```

instead of

```
std::rand() % 2
```

Also got rid of the tests that test the random seeding, since it would interfere here. And the test is not useful since we just use ATen's seeding mechanism, which should work.

Fixes  #7288 #7286 #7289

ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/8903

Differential Revision: D8667269

Pulled By: goldsborough

fbshipit-source-id: a833e86e156d5e68dae8c53a4b1c433cb0608b6c
2018-06-27 20:09:46 -07:00
a41d433d9d Check key should be string in nn.Module.add_module, parameter and buffer (#8960)
Summary:
Because I probably messed up the rebase in https://github.com/pytorch/pytorch/pull/8905
Closes https://github.com/pytorch/pytorch/pull/8960

Reviewed By: soumith

Differential Revision: D8668202

Pulled By: ezyang

fbshipit-source-id: 41e19803c7ac7aac898c8e70c6a9769314476ca9
2018-06-27 19:40:00 -07:00
07b6c28715 Fix comment in file
Summary: Closes https://github.com/pytorch/pytorch/pull/8966

Differential Revision: D8670090

Pulled By: zou3519

fbshipit-source-id: fe92f31264cec89b0e0139f44720dd72b4f31c6e
2018-06-27 19:11:14 -07:00
f52c2ca1c6 net_async tracing use enable_profile arg from NetDef (#8927)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8927

Closes https://github.com/pytorch/pytorch/pull/8855

- Add parameter `enable_tracing` to the Arg field of NetDef. `net_async_tracing` will only enable Tracer for Net instances that have this field set (unless the command line argument also include the net name).
- Append a unique id to the json profiling result file because there could be multiple instances of the same net running.
- Dump json profling file regularly instead of just when the Tracer object is destroyed

Reviewed By: ilia-cher

Differential Revision: D8372378

fbshipit-source-id: 8adc9d59f48b67456beed2e3a88235c298fdfd01
2018-06-27 16:24:57 -07:00
ba8e133844 Refactor batch sampler (#8958)
Summary:
Fixes #8652, fixes #8957
Closes https://github.com/pytorch/pytorch/pull/8958

Reviewed By: ezyang

Differential Revision: D8668253

Pulled By: soumith

fbshipit-source-id: 663d461621511166f29cfcc902e6c2a71befa647
2018-06-27 16:06:47 -07:00
6aa8b67ed0 Attempt to fix operator<< in Caffe2
Summary: Closes https://github.com/pytorch/pytorch/pull/8947

Reviewed By: dzhulgakov

Differential Revision: D8664902

Pulled By: bddppq

fbshipit-source-id: 1cf7123062b8604e4477eee6142b087675344992
2018-06-27 14:54:45 -07:00
fef9a66d08 Use torch:: instead of at:: (#8911)
Summary:
This PR is the final step to making `torch::` the only  namespace users of the C++ API ever see. Basically, I did:

``` cpp

namespace torch {
using namespace at;
}
```

And then changed `torch::` to `at::` almost everywhere. This worked surprisingly well out of the box. So users can now write `torch::relu`  and `torch::log_softmax` and `torch::conv2d` instead of having to know when to use `at::` and when `torch::`. This is happy!

Another thing I did was to have `using Dtype = at::ScalarType`, which will be the eventual name anyway.

ebetica ezyang apaszke zdevito
Closes https://github.com/pytorch/pytorch/pull/8911

Reviewed By: ezyang

Differential Revision: D8668230

Pulled By: goldsborough

fbshipit-source-id: a72ccb70fca763c396c4b0997d3c4767c8cf4fd3
2018-06-27 14:42:01 -07:00
4c5192788b Cleanup of the shipit commit (#8956)
Summary:
Some files shouldn't have been added. Minor changes.
Closes https://github.com/pytorch/pytorch/pull/8956

Reviewed By: pjh5

Differential Revision: D8667962

Pulled By: orionr

fbshipit-source-id: 3331c6e93763ea4ea5b0c17dba1f0fc92172fd1b
2018-06-27 14:41:59 -07:00
e6208b3340 by default, donot throw image decoding error (#8951)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8951

Change default value of max decode error rate to 1.0 which means we don't throw such runtime error by default

Reviewed By: avulanov

Differential Revision: D8665640

fbshipit-source-id: 9d373979dd8a97253ad528b167f8d73a28fee82a
2018-06-27 14:26:49 -07:00
da4cb226d8 Fix a bug introduced by the deletion of copy constructor of tensor
Summary: Closes https://github.com/pytorch/pytorch/pull/8942

Reviewed By: jerryzh168

Differential Revision: D8666530

Pulled By: orionr

fbshipit-source-id: ddb311141ec7dbf163665ebfc6b475b219a5a999
2018-06-27 13:10:58 -07:00
a898a8f1f0 Adding pyyaml to mac and windows builds
Summary: Closes https://github.com/pytorch/pytorch/pull/8851

Reviewed By: mingzhe09088

Differential Revision: D8666075

Pulled By: pjh5

fbshipit-source-id: a3fdc9f9801f814b1e4010bd20ba51afbb048a1d
2018-06-27 13:10:57 -07:00
624303340e Remove third_party from CODEOWNERS file (#8950)
Summary:
No longer required now that we've switched over to ShipIt on master.
Closes https://github.com/pytorch/pytorch/pull/8950

Reviewed By: Yangqing

Differential Revision: D8666175

Pulled By: orionr

fbshipit-source-id: 6d8b8b38f6558d87cabd0aa19b72a390057c137b
2018-06-27 11:54:42 -07:00
6446ffa536 More detailed help message for 'without ATen_cuda library' message. (#8898)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Closes https://github.com/pytorch/pytorch/pull/8898

Differential Revision: D8661562

Pulled By: ezyang

fbshipit-source-id: 9cb976f9642c6f40902b10b34eada2d6ff6fd81c
2018-06-27 11:44:01 -07:00
d9c64851e9 Fix nccl/CMakeLists.txt (#8948)
Summary:
Changes (were merged) in #8834 and #8829 (cc yf225 ) were lost in 9ec0a2aef4 (diff-6997846ce6daf0c271e2db9ef0508551). This PR resubmits them.
Closes https://github.com/pytorch/pytorch/pull/8948

Differential Revision: D8665760

Pulled By: SsnL

fbshipit-source-id: 15514021fa79e6b908ea665dd6cb464b3ea00ab0
2018-06-27 11:44:00 -07:00
c4744cfafa bilinear upsample operator on CPU
Summary: Add support for bilinear upsample operator on CPU.

Reviewed By: BIT-silence

Differential Revision: D7853215

fbshipit-source-id: 9043c95f9eb4e1f6df324e8f7a4e8fdb0c758f66
2018-06-27 10:12:06 -07:00
c82715ced5 Add some extra punctuation to README. (#8941)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Closes https://github.com/pytorch/pytorch/pull/8941

Differential Revision: D8661797

Pulled By: ezyang

fbshipit-source-id: 876163b11a8463d7560308b2b8e68231f2a657cb
2018-06-27 08:56:13 -07:00
9ec0a2aef4 fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af 2018-06-27 04:50:56 -07:00
290d20b094 Replace max_pool with max_pool_with_indices (#8892)
* Create max_poolXd_with_indices

* Match ATen names in ONNX symbolic
2018-06-26 17:09:30 -07:00
edb88b5f3a Update from Facebook (#8887)
* add opencl + fpga context

adds an opencl context inside caffe2/fb which can be used for fpga access

* [Caffe2] Force tensor inference checks to be triggered during testing

We've started to rely on TensorInference functions more for different analysis.  This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.

* [Caffe2] Fix cost models for DotProduct and Div.  Update Tensor Inference for dot product

As title.  DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs.  TensorInference defined to support implementation.

* [SG-MoE] Add an option to make the experts NOT as components

* [nomnigraph] Rename and fixup convertToNeuralNetOperator API

This will make things a bit cleaner

* no longer symlink THNN.h and THCUNN.h

* forced decoder network (onnx export)

Closes https://github.com/pytorch/translate/pull/95

Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties.

Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea

* Revert schema change to fix production models

Revert schema change to fix production models

* MockLogDeviceReader - rebase on FIX

# Goal

1), Build a make_mock_log_device_reader using make_mock_reader

2), Replace the real log_device_reader here: https://fburl.com/raihwf1p

# Log by D8151734

Real log_device_reader:
```
I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0
I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin

* [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier

implement log barrier as a regularization method

* Add teacher weight screening.

Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function.

* Add NormalizerContext

See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file.

I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow.

https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1

* Adding cosine similarity option in dot processor

Add pairwise cosine similarity option in dot product.
Add an option to concate dot product and cosine similarity.
Add test cases.

* [nomnigraph][redo] Concat elim for sparseNN

Same as D7962948, which was reverted because Operator Schema was not
defined

* [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN

Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads).

https://github.com/pytorch/pytorch/pull/7918/files

* [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size

enables nomnigraph and reduces codesize

* [Warmup] Allow both offline incremental training and online training

Change plan name on saving side and reading side to support both training type

This diff depends on D8128530 and D8168651.

* Revert D7802642: [Warmup] Allow both offline incremental training and online training

This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Add legacy grad logic to fix div op on old graphs.

Add legacy grad logic to fix div op on old graphs.

* Correctly propagate operator failures

Propagate errors from operators that throw exceptions and return false

* Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN

This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope

extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption().  And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope.

* [opt] hgdirsync wasn't enabled, merge diverged code

Here's the damage, P59732616 basically xplat was left behind but had
the change from assert to CAFFE_ENFORCE

* OMP parallelism over RoIs for RoIAlign op

Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on
the number of OMP threads set during startup.

PR: https://github.com/pytorch/pytorch/pull/8562

* Use int64_t for shape in FillOps

to avoid overflow of int32

* Implement Rotated RoIAlign op

Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086.
The idea is simple - orientation/angle is added as an RPN
anchor parameter and then the angle is further regressed similar to bbox
coords. There are some additional changes related to NMS and IoU, but besides
that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ.

RoIs are represented in [center_x, center_y, width, height, angle] format.
`angle` repre

* Rotated RoIAlign op CUDA forward implementation

CUDA forward impl for D8415490

* RoIAlignRotated op CUDA backward pass implementation

TSIA

* All remaining fixes to eliminate process_github.sh

Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py

remove skipIf(True, 'Fbcode') line from process_github.sh

replace sed of cpp file with #ifdef to control cudnnDestroy use

undo sync-time deletion of .gitattributes, remove process_github.sh

switch to using _utils._internal rather than try-import-except

This diff also fixes the open-source bug where rebuilds have

* Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package

* [easy] improve error log in adagrad op

as title

* re-allow use of thnn_h_path

This fixes cffi usage in OSS

* [4/4] [tum] paralyzing layerNorm for GPU full sync

as title

* add compile=False to pytorch tests, remove hack with pyc

* Add shape and type inference for RowWiseArgMax operator

See title

* Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally

# Problem

`MockHiveReader` uses `GlobalCounter` to limit `max_examples`.

GlobalCounter on server node collect local counts from worker nodes every 1 sec.

This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`.

# Plan

Given,
```
Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int

* [Caffe2] Fix FCGradient cost inference.  Prevent overflow in cost inference

FCGradient missed a factor 2 in the `num_outputs == 3` case.  Overflow was occurring with flop calculation for FC.  Changed types to `uint64_t` to prevent future problems.

* Fix binary ops with empty inputs

Fix binary ops with empty inputs

* Support the filling of input blob with provided data

as title for Biz Integrity case

* Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test.

* [c2][easy] improve pack ops error loggings

as desc.

* Add ShapeTypeInference for LpNorm operator

As desc

* Shard test_nn to reduce runtime for each test target

Closes https://github.com/pytorch/pytorch/pull/8793

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

* Change default caffe2_streams_per_gpu to 1

* Remove IN_SANDCASTLE from common.py and test_nn.py

We prefer to disable the failing tests through Sandcastle UI instead.

* Add a new class for an updated prof_dag.proto

This diff contains:
- An updated prof_dag.proto that contains blob profiles.
- A class to deserialize this information (serialization is in a follow up diff)
- Update to separate profiling information from NeuralNet (and use it as part of the class above).
- Unit tests

* Lambdarank for SparseNN

This diff adds a lambda_rank_layer for SparseNN.
 changes include
1) Adds support for multi sessions in c2 op
2) Adds support for two different loss functions in c2 op
3) Unit tests for op

* Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [easy] A few fixups to multithread predictor benchmark

(1) support perf on T6 server
(2) remove dead code

* fix a bug about the map size

as title

* Fix reduce sum on in-place case.

Fix reduce sum on in-place case.

* [Warmup] Reland reverted diff Allow both offline incremental training and online training

Closes https://github.com/pytorch/pytorch/pull/8827

fix net transform integration test. Allow offline and online trainer to coexist D7802642.

* Add StoreHandlerNotAvailableException

Add an exception for a store that is not available or has been
deleted.

* Use exception handling for fault tolerance, missing KV store

Remove status blobs to communication ops so that exceptions propagate on
failure.

* [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj

for simple bounded constrained optimization, incl non-negative box constraints.

* [GanH]: Adaptive Weighting with More Estimations

With implemented postivity optimization, we now learn adaptive weights with different
parameterizations.

This improves parameter estimation and training stability.

* Revert some changes for landing

* Remove AutoNoGIL in StorageSharing

* Temporarily disable net_tests

* Revert "[Caffe2] Force tensor inference checks to be triggered during testing"

This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4.

* Revert "Fix reduce sum on in-place case."

This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64.

* Revert "Revert "Fix reduce sum on in-place case.""

This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.
2018-06-26 14:55:48 -07:00
055f527242 [build] Use conda cmake in two CI builds (#8864)
* use conda cmake in pytorch-linux-xenial-cuda8-cudnn6-py2 and pytorch-linux-xenial-cuda9-cudnn6-py3

* update test_expect

* add exit 1

* check cmake 3.5

* bump expect driver version

* add back space
2018-06-26 17:22:04 -04:00
55757357b2 [C++ API] Better forward methods (#8739)
* Better forward methods in C++ API

capitalize error message in test_torch.test_flatten

Support for operator()

* Add operator() to Functional

* Get rid of SigmoidLinear

* Add BoundFunction to FunctionalImpl

* Remove macro from conv because it makes errors more nasty
2018-06-26 13:23:16 -07:00
f607794dc2 [c10d] No default device for ProcessGroupGloo (#8888)
This should be set by the code that instantiates it, be it the Python
bindings or other C++ code. Defaulting to use localhost is not useful
beyond tests. Instead of keeping multiple default paths around we can
punt on it here and require it to be initialized elsewhere.
2018-06-26 11:37:20 -07:00
74d2d562f3 Fix default values for affine= in the docstrings of InstanceNormXd (#8895) 2018-06-26 14:06:31 -04:00
76e9dbad37 Stop making dynamic allocations of PinnedMemoryAllocator. (#8896)
There is no relevant state in PinnedMemoryAllocator, so we
can have a single allocator with static lifetime.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-26 14:03:44 -04:00
1f36caceb2 [C++ API] Rework optimization package (#8815)
* Rework optim folder

* Removed TORCH_OPTIMIZER_CLASS macro

* Got rid of CRTP/Impl

* Removed TORCH_AUTOGRAD_KWARG

* Differentiate between Optimizer and LossClosureOptimizer

* Make Optimizers parameters based instead of model based

* Allow construction of optimizer from arbitrary vector

* Added test for zero grad

* Added test for external parameter vectors

* Now comparing against baseline values

* Documentation

* Post rebase fixes

* Different strategy for creating and accessing buffers in optimizers

* Fix member ordering
2018-06-26 10:13:14 -07:00
22ba8726da Mention MPICH_MAX_THREAD_SAFETY=multiple. (#8580)
Currently, this is a common step to enable level 3 support on MPICH based systems.
2018-06-26 12:40:48 -04:00
31327dd1e1 Unify isViewable, handle n-dimensional empty tensors. (#8883)
* Unify isViewable, handle n-dimensional empty tensors.

1) Unifies the two isViewable functions in ATen and TH.
2) Handle n-dimensional empty tensors in the implementation
3) Clarify some comments.

This requires an extra copy in the TH case, but that will go away.

* Also unify THCTensor version.

* Remove C-linkage from THTensor_compute_stride.

* Update comment.
2018-06-26 12:38:45 -04:00
6e28d4d364 Add pos_weight argument to nn.BCEWithLogitsLoss (#5660) (#6856)
* Add pos_weight argument to nn.BCEWithLogitsLoss and F.binary_cross_entropy_with_logits (#5660)
- Add an option to control precision/recall in imbalanced datasets
- Add tests (but new_criterion_tests)

* Move pos_weight to the end of args list in the documentation.

`pos_weight` was moved to the end because it is the last argument in both
`nn.BCEWithLogitsLoss` and `binary_cross_entropy_with_logits`
2018-06-26 12:31:07 -04:00
f935ba1b05 [build] Enable clang-specific warnings only when using clang (#8869)
* Wraps clang only warnings in an if

* add back -Wno-missing-field-initializers
2018-06-26 11:09:25 -04:00
8e019826c9 Fix cmake cudnn autodetection (#8891)
If CUDNN_INCLUDE_DIR, CUDNN_LIB_DIR, and/or CUDNN_ROOT_DIR were set,
but USE_CUDNN was not explicitly set, the code in
cmake/Dependencies.cmake would set USE_CUDNN=OFF even though it could
be found. This caused an issue in ATen, where it includes its CuDNN
bindings if the variable CUDNN_FOUND is set. This was the case,
because the find_package call in cmake/public/cuda.cmake searches for
CuDNN and ends up finding it. The net result is that ATen tried to
compile CuDNN bits, but the caffe2::cudnn target is never defined let
alone added as dependency, and the build fails on not being able to
find the header cudnn.h.

This change does two things:

1) Restore CuDNN autodetection by setting USE_CUDNN=ON if it is found.
2) Remove obsolete FindCuDNN.cmake module. This functionality now
lives in cmake/public/cuda.cmake.
2018-06-26 06:54:27 -07:00
af741dc2fd [c10d] Fix link order for building C++ tests (#8889)
List dependency on gloo_cuda before dependency on gloo such that
unresolved symbols in gloo_cuda are correctly resolved (since the linker
resolves from left to right).

This fixes building c10d C++ tests on GCC 4.8.
2018-06-25 23:59:32 -07:00
8ef5d37ac5 directly add_subdirectory(nanopb) from torch CMakeLists (#8870)
currently torch/CMakeLists doesn't know how to find nanopb without
some higher-level script (setup.py or build_all.sh) telling it where
to look, which is an obstacle towards fully CMake-ifying libtorch.so.
This change removes that dependency.
2018-06-25 21:23:25 -07:00
47492ed451 [C++ API] Bag of fixes (#8843)
* Bag of fixes

* Rename tensor_range.h to tensor_list_view.h

* Post rebase fixes

* Rename torch::tensor namespace to torch::tensors due to name conflict

* Avoid recursion in Module::to
2018-06-25 21:11:49 -07:00
3d580f2f7d [build] Raise in cmake when seeing NVCC{9/9.1} + GCC6 combo (#8863)
* Add error message for NVCC{9/9.1} + GCC6 combo

* requires -> require
2018-06-26 00:07:13 -04:00
8e98a1a84d Create avg_pool1d in ATen (#8880)
* Create avg_pool1d in ATen

* Put function name into check1d method
2018-06-25 20:31:32 -07:00
85f4d2b55a throw error when grid_sample is passed unsupported mode (#8884) 2018-06-25 22:37:41 -04:00
f74207c99f Allow autograd to work even when the shape of values cannot be determined (#8641)
This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.

Notes:
* Since we do not always specialize, we have a pass LowerLinearBlocks
that replaces the block with an if statement that dynamically guards
the Undef case.
* We introduce AutogradAdd which is addition that still works when
its inputs might be undefined. In cases where we specialize this will
get removed in favor of a normal add, but there are cases where
gradient graphs do not specialize (e.g. when they are not differentiable,
but a derivative is required) so it is important for this op to be executable.
2018-06-25 18:40:04 -07:00
7a614799f7 Make at::Tensor::to() const (#8839)
* Make at::Tensor::to() const

* Add cheaper checks to Tensor::to
2018-06-25 17:55:10 -07:00
5cb8586dde [auto] Update onnx to 458c521 - Fix typo (onnx/onnx#1143)
458c521844
2018-06-25 23:37:19 +00:00
288d37998a [Caffe2] Fix gradient_check on in-place ops (#8828)
* Fix gradient_check on in-place ops

* Fix hsm_test

* Fix SplitByLengthOp test

* Fix input_device_options for gradient_checker

* Fix hypothesis_test_util.py
2018-06-25 15:25:56 -07:00
838fb87874 Fix as_strided_backward (#8721)
* make as_strided safer

* patching as_strided; and stop using it in backward

* Test a simple case in as_strided_backward

* a long note

* remove boundary checks of as_strided; implement slow path

* wip

* fix as_strided backward when input is overlapping

check for input overlapping too
[doc] clarify gradcheck behabior when input is overlapping
longer note

* fix a deprecation warning in test_autograd

* nits
2018-06-25 18:17:35 -04:00
b5a123c06c [jit] Add python bindings for Gradient and differentiate (#8830)
* improve assertion error message in jit::differentiate

* add python binding for Graph::copy

* add pybind for jit::differentiate and jit::Gradient
2018-06-25 18:09:29 -04:00
49a3e49627 Fixes #8508. Upcasted loc to 1-d if a scalar loc is provided to MultivariateNormal (#8543)
* Fixes #8508 Broadcasted loc to 1-d if a scalar loc is provided to MultivariateNormal.

* move to non-inplace
2018-06-25 18:06:51 -04:00
41181169ae [auto] Update onnx to 6bedd27 - add broadcasting support for min/max/sum/mean (onnx/onnx#1124)
6bedd27b03
2018-06-25 22:03:11 +00:00
89afb93e1d Delete dead TH size inference code. (#8866) 2018-06-25 17:45:43 -04:00
cca247635c First version of dispatcher (#8713) 2018-06-25 13:11:53 -07:00
2b926aafb0 [build] disable test_expect for pinning cmake to 3.5* in dockerfiles repo (#8850)
* pin pytorch-linux-xenial* to use cmake 3.5*

* disable test_expect
2018-06-25 14:21:42 -04:00
04440d2c57 Fix nonzero and tensor printing of n-dimensional empty tensors. (#8849) 2018-06-25 12:09:47 -04:00
1e7fcb5d1b fix NCCL NVCC_GENCODE w/ multiple archs (#8834) 2018-06-25 08:07:53 -07:00
Ben
e251fb5036 Add file and line to CUDA_CHECK and CUDNN_CHECK (#8836)
* Add file and line to CUDA_CHECK and CUDNN_CHECK

* use stringstream

* clang-format

* switch to AT_ERROR
2018-06-25 10:46:52 -04:00
e31ab99932 [Ready for Review] Better fix for NCCL + sccache (#8829)
* Better fix for NCCL + sccache

* Try to set NUM_JOBS to 1

* Try to fix third_party/nccl/CMakeLists.txt as well

* Pass NUM_JOBS to nccl/CMakeLists.txt
2018-06-25 02:17:07 -04:00
50410c9572 fixes #8840 (#8841) 2018-06-25 02:01:05 -04:00
a5df8ec841 Created DefaultTensorOptions in ATen (#8647)
* Created DefaultTensorOptions

* Fix TensorOptions() call which was interpreted as function decl

* Fix empty OptionsGuard

* Make options_ and mutex_ in DefaultTensorOptions class static because of dynamic linker issues

* Make DefaultOptions thread local
2018-06-24 21:15:09 -07:00
521f5111ad [C++ API] Use torch::Tensor instead of at::Tensor/Variable mix (#8680)
* Use torch::Tensor instead of at::Tensor/Variable mix

* TensorRange -> TensorListView
2018-06-24 19:03:39 -07:00
22a70fbe2e Minor fixes for finding CUDNN (#8743)
* Minor fixes for finding CUDNN

* Minor fixes for comment

* Fix lints

* Fix naming conflicts

* Fix import name
2018-06-24 21:42:19 -04:00
fc22bf3e82 Spectral norm improvements (#8590)
* Spectral norm improvements
- Don't do iterations on weight in eval mode
  To facilitate this, register weight as buffer in order to be able
  to use module with spectral norm in eval mode after immediately
  after loading state dict (#8208)
- Use weight instead of weight_orig as weight when removing
  spectral norm
- Add dim parameter in case the normalization should occur w.r.t.
  a dimension other than 0 (#7865)

* add and update spectral norm tests

* More spectral norm tests

Thank you, Simon, for the suggestions.
2018-06-24 17:15:13 -04:00
3598356420 Port THCS to ATen. (#8689)
* Port THCS to ATen.

General structure of the sparse implementation:
- SparseCUDATensor.{cpp, cu} and SparseCUDATensorMath.cu contain
  the same functions as their CPU analogues
- SparseCUDAApplyUtils.cuh contains what used to be in
  THCSTensor.cu
- SparseCUDABlas.cu contains what used to be THCSparse.cu

Unrelated improvements:
- Forward declared CUDA types in Context.h are now moved
  exclusively to CUDAHooks
- New getCurrentCUDASparseHandle in Context
- Support for printing CUSPARSE_STATUS_ZERO_PIVOT error message
  directly

Some unusual pieces:
- get_device got the LegacyBridge makeover, as it needs special
  logic on sparse tensors (defer to the inner tensors).
- I noticed that I need to turn off device_guard codegen
  for many functions in sparse, noticed because get_device
  became a native function, and resulted in an infinite recursion.  This was
  done by adding device_guard: False to the native definitions.  An alternative
  strategy might be to make the heuristic for deciding when to put in a device
  guard more clever.

Scaffolding removal:
- LegacyBridge now special-cases only on sparse versus dense;
  no more CUDA test (hooray!)
- Native bindings get CUDA/SparseCUDA dispatch entries.

CPU sparse refactoring:
- New SparseUtils.h header, with all of the utility functions that
  used to live in SparseTensor.cpp
- new_with_tensor_sparse now correctly handles both CPU and CUDA
- transpose functions in sparse/ turned out to be dead, so I killed them

Bugs I noticed while working on this:
- I used accessor<...>() on a CUDA tensor, because I thought it does
  the CUDA-CPU sync.  It does not.


Last mile changes:
- I killed all of the THS/THCS directories, build scripts, bindings everything.
  It is now no more!
- A bunch of trampolines in LegacyBridge are no more; anything
  that was "sparse only" is now done natively.
- `sparse_coo_tensor` is implemented a little funny, but we think
  it's a good idea.
- HIP is handled by explicitly ifdef'ing out all kernels; we'll add support
  for this at some later point in time.
- TH_INDEX_BASE is now unconditionally set to 0.
- Some uses of x.type() now replaced with x.options(), the new way of doing it.
- More notes about checked_cast_tensor, and eliminate Storage/Tensor fields in
  the code gen env when they are dead.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-24 15:14:09 -04:00
731273b8d6 Improve convT output_padding docs (#8825)
* improve output_padding doc for convT modules

* Update functional.py

* Update conv.py

* lint
2018-06-23 14:33:18 -04:00
e4ff0b8aa1 remove unnecessary headers from SpectralOps, add cuda.h include to deviceutils (#8819) 2018-06-23 14:31:13 -04:00
ebae3f502c Fix CUDA_NVCC_EXECUTABLE from being set to empty (#8822) 2018-06-23 11:11:32 -04:00
7fbd57091d Doc: specify batch_first is True by default in RNN (#8807) 2018-06-22 19:33:25 -04:00
74fa304b31 [Caffe2] Export clang compilation datatbase in setuptools build (#8811) 2018-06-22 16:19:43 -07:00
12904edae9 Test that broadcast doesn't copy when dst and src devices are the same (#8803)
* test that broadcast doesn't copy when dst and src devices are the same

* only test if input is cuda
2018-06-22 17:36:19 -04:00
46bff5d9ff Set MKL VML error mode to ignore (#8800) 2018-06-22 16:54:47 -04:00
73b92472d2 [README.md] Use GitLab URL for CMake (#8799)
* update to GitLab url

* use GitLab url for upstream CMake
2018-06-22 16:51:35 -04:00
1d4cf095b8 Add CUDA to logspace and linspace declarations in Declarations.cwrap (#8798)
* Add CUDA to logspace and linspace

These functions are already implemented, but where not exposed. Fixes https://github.com/pytorch/pytorch/issues/8786 .

* Add small tests
2018-06-22 16:14:27 -04:00
675b579bf9 cmake wrapper (#8797) 2018-06-22 15:29:25 -04:00
d3ec956d91 Revert "ROCm 1.8.2 does not define CUBLAS_STATUS_ARCH_MISMATCH (#8732)" (#8791)
Upstream fixed 1.8.2, and it will be fine in the final release.

This reverts commit 9dffaf593e8c58a6d02583079162f4a88cb1bc66.
2018-06-22 15:05:56 -04:00
f138111d52 remove unused flag (#8779) 2018-06-22 10:54:48 -07:00
ddda7cfea5 allow output_size to contain None in adaptive pooling methods (#8596)
* allow output_size to contain None in adaptive pooling methods

* fix lint

* address comments
2018-06-22 13:29:15 -04:00
b1b77c9eb5 Use virtual dtor for Annotation (#8780) 2018-06-22 10:20:37 -07:00
e6c7b38f94 Cache cufft plans (#8344)
* cache cufft plans

* use an LRU cache

* suffix CuFFTParams members with _

* import print_function for py2

* lint

* fix potential race; add dummy impl for CPU only builds

* cpp formatting; remove nccl makefile change

* Use CUDA hooks instead

* comments and doc

* update the error message

* move LRU cachae to a separate file and native::detail namespace

* update comment

* specify NOTE location in CuFFTPlanCache.h

* update disabled_features.yaml to make amd ci work

* another fix for AMD CI in disabled_features.yaml

* Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__

* improve the notes

* lint

* revert onnx change

* put back inlining for CUFFT_CHECK
2018-06-22 13:02:34 -04:00
fed44cb1b3 Remove aten project for main build (#8532) 2018-06-22 08:40:44 -07:00
ce13ca235e added default lambd=0.5 for hardshrink (#8770)
* added default lambd=0.5 and tests

* lint
2018-06-22 09:52:55 -04:00
5a7b4840d9 Move nanopb-generated ONNX to unique file name (#8773)
* Move nanopb-generated ONNX to unique file name

* fix other places
2018-06-22 09:51:56 -04:00
9c426797a8 Expose is_compatible function (#8783) 2018-06-21 23:37:54 -07:00
83f846ff7a [auto] Update onnx to 410530e - Make test suite backward compatible (onnx/onnx#1137)
410530e8c6
2018-06-22 06:35:03 +00:00
bd95f8f948 Resolve name conflict of ContextManager (#7244)
* Resolve conflicting name, ContextManager

Concept name `Context Manager` is taken by Python. See https://docs.python.org/3.6/reference/datamodel.html#with-statement-context-managers

It says,
A context manager is an object that defines the runtime context to be established when executing a with statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code.

The `ContextManager` here is more like a registry. 
And there is a C++ registry in caffe2 codebase `caffe2/caffe2/core/registry.h`.
There is also a Caffe2DBRegistry, declared by calling `CAFFE_DECLARE_REGISTRY(Caffe2DBRegistry, DB, const string&, Mode);` in `caffe2/caffe2/core/db.h`.

I think we can follow the concept name `Registry`, calling it `ContextRegistry`.

* Make Classes and Functions internal to this module start with "_"

Make Classes and Functions internal to this module start with "_"

* Update context.py

* Update context.py
2018-06-22 00:41:51 -04:00
53c0de57d9 Document ideal vs actual SparseTensorImpl invariants. (#8776) 2018-06-21 23:08:18 -04:00
fd32cc6118 Disable sccache when building NCCL (#8708)
* Disable sccache when building NCCL

* Fix nccl CMakeLists.txt
2018-06-21 17:30:07 -07:00
0750967496 Adjust nested parallelization to deal with OMP (#8723)
* Adjust parallelization to deal with OMP
2018-06-21 20:24:53 -04:00
54a2e817a6 [auto] Update onnx to bc986de - Add is_compatible method in python backend (onnx/onnx#1132)
bc986dee4c
2018-06-22 00:16:24 +00:00
dc5837a1f4 [JIT] Adds fp16 support to the jit (#8679)
* adds fp16 support to the jit

* improves formatting

* improves formatting

* added an explanatory comment

* fixes Python2 flake8

* updates c code

* all except halfs
2018-06-21 18:14:51 -04:00
709c300437 [c10d] Configurable number of algorithm entries per key (#8765) 2018-06-21 14:30:55 -07:00
2bb7e480c1 Define conversions and operations on at::Half (#8660)
The goal is to be able to use at::Half throughout ATen, including in
CUDA kernels and have it operate like built-in types. This avoids the
need for cuda::from_type and cuda::to_type before every
AT_DISPATCH_ALL_TYPES_AND_HALF call.
2018-06-21 17:16:32 -04:00
41c08fe4a1 Add tools/shared/_utils_internal.py to gitignore (#8756) 2018-06-21 13:28:46 -07:00
8489c4cc6e Better support for literals in jit script (#8687)
Addresses #8177

A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version

General approach:
- Add NumberType, FloatType, IntType to represent Python numbers, floats and ints.
- Emit these types for python literals
- Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType.
- Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below)
- Erase NumberType,  prim::NumToTensor, and prim::TensorToNum for ONNX export

### Tensor/number math
```
import torch
@torch.jit.script
def fn(x):
    return x + 1
```
```
graph(%x : Dynamic) {
  %1 : int = prim::Constant[value={1}]()
  %2 : Dynamic = prim::NumToTensor(%1)
  %3 : Dynamic = aten::type_as(%2, %x)
  %4 : Dynamic = aten::add[alpha={1}](%x, %4)
  return (%5);
}
```

### Number/Number Math
```
import torch
@torch.jit.script
def fn(zero):
    c = 1 + 1
    return zero + c
```
```
graph(%zero : Dynamic) {
  %1 : int = prim::Constant[value={1}]()
  %2 : int = prim::Constant[value={1}]()
  %3 : Dynamic = prim::num_to_tensor(%1)
  %4 : Dynamic = prim::num_to_tensor(%2)
  %5 : Dynamic = aten::add[alpha={1}](%3, %4)
  %c : int = prim::TensorToNum(%6)  # this is the result of the addition
  ...
  return (%13);
}
```

List of squashed commits:

* Introduce Python Number types

Added: IntType, FloatType, NumberType with
IntType <: NumberType
FloatType <: NumberType

Changed aten_schema so arguments have corresponding types

* Emit a NumberType for python literals.

Also emit a NumberType for Scalar default values.

* Add prim::NumToTensor and prim::TensorToNum

* Add DynamicType -> NumberType implicit cast for bc

* Better ensureTensor error message

* Add ensureTensorOrNumber. Allow passing Number to some functions

Like the range() construct and slices

* Patch IntList to work.

IntList is still a DynamicType in the frontend: a tensor gets built from
a List[int].

Also, IntList[1] is a "union between int and IntList" the way it is
implemented. If the frontend sees an int being passed for an IntList[1]
arg, it converts it to a tensor as well.

* Enforce some order on schemas to avoid overload ambiguity

add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This
matches the order in which python_arg_parser parses its arguments.

* Disable std_dim and var_dim tests.

With the new schema information, std(input, keepdim) and std(input, dim)
are ambiguous. This will need to be fixed at a later date.

* Add NumberType erasure pass.

This is used for ONNX export and to ensure that NumberType information
doesn't reach the interpreter

* Add support for mixed tensor/number math ops.

* Tests for new functionality.

Includes:
- Tensor/number math
- number/number math
- EraseNumberTypes pass test

* Patch tests

Update expect tests for:
- decompose_addmm
- loop unrolling tests

Because python numbers are now NumberType, they cannot be returned by
functions anymore. Work around this by using "torch.full", or by adding
a tensor([0]) (taken from FIXME_zerol()). Both approaches are used
because torch.full is more readable, but it is broken in some cases.

* Add erase_number_types to torch/CMakeLists.txt

* Move math back to emitSimpleExpr from emitSugaredExpr

* Remove some dead lines

* Renable some excluded script/trace tests that are fixed.

* Move some tests to expected failure

* Address some comments (more addressing to come)

* Erase relevant aten::type_as nodes in EraseNumberTypes

I also changed it so that EraseNumberTypes is only called for ONNX
export. It is no longer used to prevent
prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or
interpreter.cpp.

shape_analysis infers the type of the output of these nodes to be the
same as their input.

intepreter.cpp treats both of these nodes as no-ops.

* Add reminder to fix std/var

* Call EraseNumberTypes only when exporting a script module

* Update expects after rebase
2018-06-21 15:43:38 -04:00
3de45f3430 Add ssnl and zou3519 as pytorch doc owner (#8754) 2018-06-21 15:10:02 -04:00
be3d65a7e2 i2h<->h2h in gif (#8750)
* i2h<->h2h

* should have 11 frames
2018-06-21 14:46:47 -04:00
c8cc246226 [JIT] Tests for calling between different frontend modes (#8704) 2018-06-21 10:38:03 -07:00
40262ca9d1 Disable flaky test_lstm_fusion_cpu test (#8747)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-21 10:32:27 -07:00
e07a49e15a Set DEBUG=1 in trusty-py3.6-gcc5.4 CI build (#8593) 2018-06-21 12:58:43 -04:00
b300934db6 Add CUDA 9.2 + GCC 7 build and test to CI (#8592) 2018-06-21 12:58:28 -04:00
117b77e574 Install vim by default on all Caffe2 docker images. (#8731)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-21 11:10:32 -04:00
98a7d84a5a Link to C++ extensions in README.md (#8737) 2018-06-21 09:48:04 -04:00
c0dfe23703 Support n-dimensional empty tensors in (most of) THCUNN. (#8722)
* Support n-dimensional empty tensors in (most of) THCUNN.

* Fix incorrect parens.
2018-06-21 09:12:16 -04:00
9b465313cf Support n-dimensional empty tensors in more of TH/THC. (#8726)
* Support n-dimensional empty tensors in more of TH/THC.

* Fix warning.
2018-06-21 09:11:28 -04:00
9dffaf593e ROCm 1.8.2 does not define CUBLAS_STATUS_ARCH_MISMATCH (#8732)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-21 08:31:08 -04:00
ac068fdabe Use env var to pass sharding options to test_nn.py (#8727)
Buck doesn't support passing arguments to Python unit tests, and we have to use environment variables to pass the sharding options instead. Also, buck test doesn't go through the __name__ == '__main__' code path and we need to move the env var checking logic to top-level.

* Use env var to pass sharing options to test_nn.py

* Move env var checking to top-level

* fix lint
2018-06-21 08:30:28 -04:00
bbd71a7c81 [auto] Update onnx to 9b9f595 - Make axis optional (onnx/onnx#1128)
9b9f595107
2018-06-21 05:49:53 +00:00
Ben
4f604a436b Export tensor descriptor (#8313)
* Export TensorDescriptor

* Export descriptors

* install cudnn_h

* Add tests and with_cuda

* tab to space

* forgot cpp

* fix flake

* ld flags

* flake

* address comments

* clang-format

* fixtest

* fix test

* extra headers

* extra headers

* camelcasing
2018-06-20 22:32:50 -07:00
35e66efbfc Don't set HIP flags on non-HIP build. (#8728)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-20 23:53:31 -04:00
6181979a7c [auto] Update onnx to 7558954 - Use cmath instead of math.h (onnx/onnx#1129)
7558954ffd
2018-06-21 02:56:50 +00:00
d79711d689 [auto] Update onnx to 068f1a4 - Optimization pass to fuse batch normalization operator with convolution operator (onnx/onnx#1106)
068f1a4079
2018-06-20 23:48:51 +00:00
f037d392c1 Support n-dimensional empty tensors in (most of) THNN. (#8702)
* Support n-dimensional empty tensors in (most of) THNN.

Most of the argument checking in THNN is directly around dimensionality, which doesn't work in general for n-dimensional empty tensors, because
you will end up dividing by 0 or similar.  Instead, we change these to check for empty and give error messages for those cases as well.
In some cases, the error messages are improved as well.

* Fix bug.
2018-06-20 18:30:19 -04:00
1e570fa5a8 Add c10d/Def.hpp placeholder (#8711)
This is a placeholder for the header that is generated CMake.
It is needed if you include the c10d headers directly from this directory.
2018-06-20 15:03:58 -07:00
802929608c [JIT] Improve test coverage for ErrorReport instances (#8668)
* [JIT] Coverage for ErrorReport

* Fixes

* lint

* More coverage
2018-06-20 14:51:53 -07:00
d00c79f2b5 Improve cudnn RNN backward error message in eval mode (#8706)
* Improve cudnn RNN backward in eval error msg

* fix accidental change
2018-06-20 17:47:17 -04:00
17784d2029 Make at::tensor faster (#8709) 2018-06-20 14:46:58 -07:00
544690bf4e Update rnn.py (#8705) 2018-06-20 17:46:09 -04:00
48e90e3339 Build system changes (#8627)
* All changes needed to get rid of process_github.sh

* allow thnn_h_path
2018-06-20 17:45:26 -04:00
0acddd6cee Add torch.cuda.cudnn_is_available (#8703) 2018-06-20 14:18:03 -07:00
85468155ce Implement OpSchema and a default DispatchKey (#8662) 2018-06-20 14:14:24 -07:00
f9da3aa1aa [auto] Update onnx to b1571d8 - ONNXIFI loader library (onnx/onnx#556)
b1571d829f
2018-06-20 20:59:15 +00:00
5642937ac1 more formatting (#8701)
* fix formatting in :math: in fold docstring

* escape more underscores
2018-06-20 15:32:33 -04:00
3e25b4af6d Fix #8692 (#8699) 2018-06-20 15:17:54 -04:00
73ce21a313 Create captured inputs recursively for loop to resolve loop-carried dependencies across nested blocks (#8345)
* enable captured inputs for if Stmt to fix the carried deps bug in nested
blocks

* postpone captured inputs deletion and add new test case

* recursively generate captured values for nested loops

* check asSimple when recursively create captured input
2018-06-20 12:09:24 -07:00
d6c873a393 Shard test_nn to reduce runtime for each test target (#8678)
* Shard test_nn to reduce runtime for each test target

* Use load_tests for selecting tests to enable

* fix lint

* Use arg parser from common.py
2018-06-20 15:01:28 -04:00
9335885b1b Create at::tensor (#8475) 2018-06-20 11:44:21 -07:00
b4cd9f2fc9 Clarify mp note about sharing a tensor's grad field. (#8688)
* Clarify mp note about sharing a tensor's grad field.

* Address comments

* Address comments
2018-06-20 14:22:38 -04:00
08c1770d79 Add owner rule for cpp_extension.py (#8700) 2018-06-20 14:11:28 -04:00
b492d103ee fix formatting in :math: in fold docstring (#8696) 2018-06-20 13:36:57 -04:00
b6af5d40bf Some 0-sized dimension support, port catArray away from resizeLegacy. (#8666)
* Some 0-sized dimension support, port catArray away from resizeLegacy.

The goal of this PR is to port catArray away from resizeLegacy (so we can delete the legacy resize calls), but since catArray has some weird behavior because
we don't have arbitrary 0-sized dimension support, I made some effort to fix these both in one pass.

The major changes here are:
1) catArray uses the new resize API, no longer the old resizeLegacy API.
2) As 1) is the last usage of resizeLegacy, it is deleted.
3) If compiled with USE_TH_SIZE_ZERO_DIM, catArray will work and properly check shapes for n-dimensional empty tensors.
4) However, we retain the old behavior of "ignoring" size [0] tensors in catArray.  We previously allowed this because we didn't have n-dimensional empty tensors.
5) To get the above to work, we also add support for n-dimensional empty tensors for narrow and slice (ifdef USE_TH_SIZE_ZERO_DIM).
6) We change the stride formula for empty tensors to match NumPy; basically, we never multiply by 0 as the size, always at least 1, so the
   strides are monotonically increasing in the empty tensor case.
7) We print the size of empty tensors if size != [0]; this matches NumPy behavior (even in cases where the size could be inferred from the brackets.
8) For test purposes, we add torch._C._use_zero_size_dim() to add tests for the above.

* Fix flake8.

* Address review comments.
2018-06-20 13:26:08 -04:00
cc6b046f48 Implement flatten function (#8578)
* Implement flatten function

* address comments

* allow start_dim=end_dim

* undo submodule change
2018-06-20 12:53:06 -04:00
065fdbd500 Created Tensor::to functions (#8643)
* Created Tensor::to functions

* Only have to(dtype) and to(device)

* Ignore requires_grad in TensorOptions(Tensor) constructor
2018-06-20 09:28:08 -07:00
d97c9dd019 Add a warning in gradcheck if inputs precision < float64 (#8663)
* Solves #8659

This PR adds a warning to alert users about the possibility of a failure in the gradcheck

* Fix lint

* Update gradcheck.py

* Update gradcheck.py

* update error message

* Update warning message to be more descriptive
2018-06-20 12:23:22 -04:00
61b863cbdc Fix parsing of floating point defaults in python_arg_parser (#8681) 2018-06-20 12:17:44 -04:00
3da27312bb Export ProcessGroupGloo options to Python (#8664)
This surfaces the options struct that can be passed to the
ProcessGroupGloo constructor to Python. By default, if no options struct
is passed at construction time, the Python bindings default to using a
struct with a TCP backed Gloo device that uses the machine's hostname to
resolve the IP address to bind to.
2018-06-20 09:08:06 -07:00
0e0031e204 Fix build error in pybind_state_ideep (#8684)
Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-06-20 08:29:48 -07:00
695fd98192 Compatibility: write nDimension/_nDimension corresponding to dim()/_dim(). (#8676)
Currently, THTensor_(nDimension) goes to _dim(), which makes it difficult to move individual usages over to the new API.
Instead, let's create a THTensor_(_nDimension) going to _dim() and THTensor_(nDimension) going to _dim().  To do this, we will redirect all current
calls and move them over as we did for _dim() and dim().
2018-06-20 11:00:25 -04:00
6402a4278b Improve win-build.sh for local build (#8674) 2018-06-20 09:41:50 -04:00
be3e3f2ec8 don't do unnecessary copies for bernoulli_ (#8682) 2018-06-20 10:53:35 +02:00
7fa81d6dbc Use parallel if get_num_threads 0 (#8677) 2018-06-19 22:12:15 -04:00
8e4fe5dcf4 Fix serialization for Parameters (#8633)
* Fix serialization for Parameters

* address comments

* addres comments
2018-06-19 22:11:13 -04:00
637dcdc279 Remove dangling inclusion path (#8671) 2018-06-19 17:02:20 -07:00
d46312fd15 Create at::from_blob (#8640) 2018-06-19 17:00:28 -07:00
66e8ecf2ea 16bit typeid (#8534)
* 16bit typeid

* CaffeTypeId::createTypeId() instead of TypeMeta::_createTypeId()
2018-06-19 19:23:58 -04:00
4608aa3058 Setup wrappers to get vectorized version of mean (#8618)
* Setup wrappers to get vectorized version of mean

* Responding to review 1

* Responding to review 2

* Use variadic AT_CHECK

* Fix AT_CHECKS in ReduceOps
2018-06-19 18:14:35 -04:00
d3b690ecd5 TensorTypeId (#8389) 2018-06-19 15:05:24 -07:00
7a048cdcd7 Vectorize non-contiguous unary operations (#8488)
* Vectorize non-contiguous unary operations

All builds pass. Manual Windows rerun is here:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/9714/
2018-06-19 16:56:49 -04:00
03f7289fcf Add CAFFE2_USE_CUDNN guard on context_gpu.cu (#8657) 2018-06-19 13:49:06 -07:00
2bf8b702a3 Fix broadcast copying device[0] tensor when not using NCCL (#8222)
* Fix broadcast copying device[0] tensor when not using NCCL; Avoids potential extra copy in flatten_dense_tensors

* use toType

* revert dense_flat changes

* address comments
2018-06-19 16:34:29 -04:00
a60540ed2b Make NCCL build select NVCC_GENCODE smarter (#8615)
* Make NCCL build select NVCC_GENCODE smarter

* add info print

* replace ; with \s

* gencode\s -> gencode=

* Don't let nccl use sccache
2018-06-19 16:31:17 -04:00
61c96811be [c10d] NCCL python binding and CI test, with bug fixes (#8357)
* [c10d] NCCL python binding and CI test, with bug fixes

* Addressed comments and further bug fix

* Made NCCL build optional, made C10D libc10d.a only

* Fixed tests so that NCCL pg won't run when not neeeded

* Addressed comments
2018-06-19 13:02:39 -07:00
5ca4f5b43b [JIT] Remove dead functions (#8658) 2018-06-19 12:46:23 -07:00
a2dd707031 [C++ API] Create fixed width dtypes in torch:: namespace (#8639)
* Create fixed width dtypes in torch:: namespace

* Make kByte -> kUInt8
2018-06-19 12:40:58 -07:00
7ccecbbb4e Create Tensor::options (#8630) 2018-06-19 11:09:01 -07:00
6cc7670bed Port all indirect calls of resizeNdLegacy to resizeNd. (#8603)
* Port all indirect calls of resizeNdLegacy to resizeNd.

* Handle 1-d to 1-d resize.

* Maintain behavior of tensor.set_().

* Fix lack of initializer_list in C :).

* Return full dimensionality from newSizeOf.
2018-06-19 13:28:48 -04:00
65f7797d4d typo corrected (#8632) 2018-06-19 10:23:08 -07:00
c80a703829 Add CODEOWNERS entry for third_party to track changes (#8654) 2018-06-19 08:59:11 -07:00
b8b051cc19 change avg_pool2/3d count_include_pad default to what it is in the docs and in 0.2 (#8645) 2018-06-19 11:55:57 -04:00
9a9eadacc6 explicitly check device for grid_sampler (fixes: #8599) (#8646) 2018-06-19 11:53:46 -04:00
5f64484800 update to avoid potential duplicate error msg (#8638) 2018-06-19 08:50:00 -07:00
32bc28dd18 caffe2 export (#8642) 2018-06-19 00:50:33 -07:00
1ac1a9dbc6 update doc for comparison operators (#8636) 2018-06-18 21:15:22 -07:00
f14887a63f check for exact shape match before loading (#8619)
* check for exact shape match before loading

* Use RuntimeError instead of ValueError to keep it consistent with other errors

* fix lint
2018-06-18 20:16:34 -07:00
271406f276 [C++ API] Make pImpl easy to use in modules to enable happy reference semantics (#8347)
* Created TORCH_MODULE macro

Rewrote Linear

Rewrote Dropout and added default constructor to TORCH_MODULE macro

Turned TORCH_MODULE contens into a proper base class

Added some documentation

Got rid of the old Dropout module

Got rid of the old Embedding module

Got rid of the old BatchNorm module

Got rid of the old Conv module

Fixing optimizers

Rebase

Removed old RNN modules and the TORCH_ATTR macro

Removed temporary P:: namespace

Added cloning behavior to all modules

Got rid of some get() calls

self review nits

Remove noexcept from ModuleHolder methods that can throw

Remove spaces

Add missing override to reset() methods

Added examples to documentation in pimpl.h

* Post rebase fixes
2018-06-18 19:45:53 -07:00
d3651585b8 Simplify pthreadpool implementation on top of Caffe2 thread pool (#7666)
Remove one layer of pointer dereference when calling the thread pool.
2018-06-18 19:06:50 -07:00
2289815fc3 Make CI green again (#8631) 2018-06-18 17:11:04 -07:00
6307c117b3 Fix const type qualifier warning (#8613) 2018-06-18 16:34:02 -07:00
c44c95fd0b New operator 'expand' (#8263)
* operator 'expand'

* updated operator with a simple testcase

* Revert "updated operator with a simple testcase"

This reverts commit 1ce9f8ac567b525677254b0dce5735d7fea133d7.

* updated operator with a simple testcase

* expand operator with a passed testcase

* typo

* GPU full support added

* GPU support testing...

* GPU full supported

* formatted

* nits repaired

* gpu parameters fixed

* Expander removed

* nits fixed, document added

* formatted

* new testcases added & nits repaired
2018-06-18 16:33:47 -07:00
05c473b85c Temporarily remove TBB (#8255) 2018-06-18 19:31:57 -04:00
4f37a6481d Fix DeviceGuard usage in THD (#8622) 2018-06-18 18:33:54 -04:00
10961a5b6d Add OpenMPI for MPI tests. (#8625)
* Add mpich for MPI tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Changed to OpenMPI

* Comments change
2018-06-18 15:30:01 -07:00
a7bf539002 [JIT] add missing check for excluding tensor method tests (#8617)
* Improve check for addmm in autodiff

* Fix missing check for excluding tensor method tests
2018-06-18 15:13:57 -07:00
525aa74165 Improve check for addmm in autodiff (#8575) 2018-06-18 15:12:56 -07:00
e4f254224e apt update before installing nccl2 (#8624)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-18 15:10:02 -07:00
11ea8175d4 Remove all resizeLegacy calls, except for catArray. (#8616)
catArray is more complicated because it requires real 0-size dimension support.  The other changes are safe in that the functions are never called (and are now deleted), or
they are used on a result of THTensor_(newSizeOf), which has a valid size.
2018-06-18 18:08:04 -04:00
0a5fe55c9f [auto] Update onnx to 53edd9e - Exclude Random Generator from Test Coverage Stat (onnx/onnx#1119)
53edd9e80e
2018-06-18 20:08:51 +00:00
90532d5f57 Don't use MKL VML for log2 if below MKL build 20180406 (#8614) 2018-06-18 16:07:01 -04:00
ae25737455 Add kwarg support to test_autograd and stop using deprecated schema for accumulation ops (#8574) 2018-06-18 12:41:22 -07:00
2039c7a38f Fix test_rnn_args_check (#8606)
test_rnn_args_check generates mismatched input_shape and hidden_shape
args. To do this, it changes a dimension of input_shape or hidden_shape
to have an incorrect size.

Before, the test was changing the size of a dimension to -1. However,
this is flawed because an input of size i.e. (6, -1, 2) is wrong.
This PR fixes it so that the test changes sizes of dimensions to
`bad_size = 7`. As long as none of the other sizes (input_size,
hidden_size, num_layers, batch_size) divide this, we don't have to worry
about that dimension being accidentally broadcasted into working.
2018-06-18 14:08:57 -04:00
e62c3a470c [Caffe2] Make cmake find current Python first (#8569)
* Make cmake find current Python first

* Switch from string syntax to list syntax in cmake/Dependencies
2018-06-18 09:39:37 -07:00
88db4c816e Disable flaky Chaining tests (#8601)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-18 11:24:37 -04:00
c1d04c73d2 Implement non-legacy TH/THC resize, with pseudo 0-sized dimension support. (#8559)
Unlike resizeLegacy / resizeNdLegacy, these don't call deprecated methods (e.g. _dim) and don't map between logical sizes (i.e. nDimension == 0 -> size [0]).
What you ask for is what you get.

The full 0-sized dimension support is hidden behind an ifdef, because we it's not fully supported yet.
2018-06-18 10:37:31 -04:00
d813ffc613 Dont show Python frames in backtrace (#8579) 2018-06-18 10:13:08 -04:00
0ae8b6c027 add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc (#8600)
* add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc

and a few drive-by doc fixes

* typo
2018-06-18 09:36:42 -04:00
372d1d6735 Create ATen tensors via TensorOptions (#7869)
* Created TensorOptions

Storing the type in TensorOptions to solve the Variable problem

Created convenience creation functions for TensorOptions and added tests

Converted zeros to TensorOptions

Converted rand to TensorOptions

Fix codegen for TensorOptions and multiple arguments

Put TensorOptions convenience functions into torch namespace too

All factory functions except *_like support TensorOptions

Integrated with recent JIT changes

Support *_like functions

Fix in place modification

Some cleanups and fixes

Support sparse_coo_tensor

Fix bug in Type.cpp

Fix .empty calls in C++ API

Fix bug in Type.cpp

Trying to fix device placement

Make AutoGPU CPU compatible

Remove some auto_gpu.h uses

Fixing some headers

Fix some remaining CUDA/AutoGPU issues

Fix some AutoGPU uses

Fixes to dispatch_tensor_conversion

Reset version of new variables to zero

Implemented parsing device strings

Random fixes to tests

Self review cleanups

flake8

Undo changes to variable.{h,cpp} because they fail on gcc7.2

Add [cuda] tag to tensor_options_cuda.cpp

Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks

Fix linker error in AutoGPU.cpp

Fix bad merge conflict in native_functions.yaml

Fixed caffe2/contrib/aten

Fix new window functions added to TensorFactories.cpp

* Removed torch::TensorOptions

Added code to generate wrapper functions for factory methods

Add implicit constructor from Backend to TensorOptions

Remove Var() from C++ API and use torch:: functions

Use torch:: functions more subtly in C++ API

Make AutoGPU::set_device more exception safe

Check status directly in DynamicCUDAHooksInterface

Rename AutoGPU to DeviceGuard

Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad

remove python_default_init: self.type()

Add back original factory functions, but with deprecation warnings

Disable DeviceGuard for a couple functions in ATen

Remove print statement

Fix DeviceGuard construction from undefined tensor

Fixing CUDA device compiler issues

Moved as many methods as possible into header files

Dont generate python functions for deprecated factories

Remove merge conflict artefact

Fix tensor_options_cuda.cpp

Fix set_requires_grad not being checked

Fix tensor_new.h

TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac

Fix bug in DeviceGuard.h

Missing includes

TEMPORARILY moving a few more methods into .cpp to see if it fixes windows

Fixing linker errors

* Fix up SummaryOps to use new factories

Undo device agnostic behavior of DeviceGuard

Use -1 instead of optional for default device index

Also move DeviceGuard methods into header

Fixes around device index after optional -> int32_t switch

Fix use of DeviceGuard in new_with_tensor_copy

Fix tensor_options.cpp

* Fix Type::copy(

* Remove test_non_float_params from ONNX tests

* Set requires_grad=False in ONNX tests that use ints

* Put layout/dtype/device on Tensor

* Post merge fixes

* Change behavior of DeviceGuard to match AutoGPU

* Fix C++ API integration tests

* Fix flip functions
2018-06-16 00:40:35 -07:00
c9b8d8566d Added flip() fn in ATen (CPU + CUDA) (#7873)
* Spelling fix in MultivariateNormal docstring (#7915)

* [c10d] MPI Process Group Implementation (#7783)

This provides a bare-minimum MPI Process Group implementation, the commit is on top of @pietern's Gloo Process Group PR.

* [c10d] MPI Process Group Implementation

ref: https://github.com/pytorch/pytorch/issues/7434

* Better exception, atexit func, and addressed comments

* Clang formatting changes

* Static initialization and addressed comments

* Added constness back

* Test will now launch mpi processes if found

* CMakeList Changed

* Fix Windows doc for import error (#7704)

* Fix Windows doc for import error

* Fix doc again

* Fix wrong format

* Moved condition for dilated grouped convolutions to CUDNN convolution implementation (#7465)

* Updates to caffe2 operator documentation (#7917)

* Significant updates to the operator docs in prep for merge

* [auto] Update onnx to 307995b - Update from upstream (onnx/onnx#1038)
307995b143

* Test if ASAN is actually working as part of ASAN tests. (#6050)

* Test if ASAN is actually working as part of ASAN tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Drop explicit use of libstdc++, we should not care.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Build with DEBUG=1

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Increase main thread stack size when using ASAN.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Split up detail.h (#7836)

* Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952)

* Fix fbcode compatibility (#7939)

* add test for correctness of transpose fusion (#7950)

* [JIT][script] Fix emitted gather and slice for dynamic indices (#7861)

* [JIT][script] Fix emitted gather for dynamic indices

* Also fix slice

* Address comments

* cache and use BLAS_SET_BY_USER so that it doesn't set itself to TRUE when run second time (#7942)

* Add unsafe flag to skip checking in prepare (#7832)

* Add unsafe flag to skip checking in prepare

* pop

* Rename cuda::type to cuda::into_type and provide cuda::from_type. (#7937)

These are used to convert Half -> half and half -> Half respectively.
from_type will be used for runtime type checking in THC.

* Try to fix TORCH_CUDA_ARCH_LIST for PyTorch again (#7936)

* try again

* use DEFINED

* use a loop

* Minor fixes

*  remove sort requirement from pad-sequence (#7928)

* pad-sequence no longer requires sorting entries

pad-sequence can get the max_len from the list of sequences. entries only need to be sorted if output will be used for pack_padded_sequence, which can throw the error itself.

* remove sort requirement from pad-sequence

Picks up from #5974.

Removes the requirement that input sequences to pad_sequence have to be
sorted. Addressed the comments in the PR:
- Updated docstring for pad_sequence
- Remove sort requirement in pad_sequence test
- Test unsorted and sorted sequences in pad_sequence test

* Fix checkBackend error message (#7926)

* Fix checkBackend error message

Fixes #7849

* Switch order of printing args

* Split CI tests in half and run them in parallel (#7867)

* Split and run tests in parallel

* Refactor tests

* Handling of scalars in torch.Size (#5676)

* Handling of scalars in torch.Size

torch.Size() constructor uses python_arg_parser

IntList in python_arg_parser can take iter/range

Have IntList take python iterables and ranges.

Address comments: don't use python_arg_parser and instead call __index__ in THPSize_pynew

Address comments

Address comments

* Rebased

* Address nit

* [JIT] Fission and fusion passes for addmm (#7938)

* Addmm decomposition pass

* Addmm peephole pass

* Fix handling of output shape in fusion pass

* Add DCE to the peephole passes

* add comments

* maybe bugfix?

* Fix GPU tests

* fix py2/3 test issue

* Set smaller grain size for some cases (#7941)

* Fix returning scalar input in Python autograd function (#7934)

* fix _wrap_outputs not working with scalar inputs

* add a test

* Prevent git autocrlf for bash scripts (#7949)

* Delete unused file (#7919)

* Fix typo in autodiff formula for addmm (#7932)

* 1) use meshgrid for flip() CPU implementation, only need one copy of input tensor; 2) changed kernel of CUDA implementation, no need materialized indices tensor; 3) reusing error checking code

* [caffe2] YellowFin parameter update GPU code fix. (#6993)

* [Caffe2] Keep name of caffe2_pybind11_state and caffe2_pybind11_state_gpu in debug build (#7155)

* Allowing MatMul to create a gradient even with 3 inputs. useful if you are differentiating a graph twice (#6536)

* added const for local variables

* Fix the cpp libtorch CUDA build (#7975)

* Use mingfeima's mkldnn (#7977)

* Fix the import part of the windows doc (#7979)

* Change perf test folder after git checkout (#7980)

* Move the broadcast check in MKL Add/Sum to runtime (#7978)

* Use Glog's implementation of STL logging when possible. (#7206)

Inject custom workaround into namespace std so that it can be found by ADL.

* [Hotfix] Bring back warnings and -Werror to ATen (#7866)

* Bring back warnings and -Werror to ATen

* Unbreak...

* Fix tbb errors

* Enable ONNX backend Mean tests (#7985)

* Add third wayt to determine IS_CONDA (#7971)

* Fix EmbeddingBag max_norm option (#7959)

* fix EmbeddingBag max_norm option

* flake8

* add warning to the embedding bag arg change

* Raise error when torch.load a storage on a non-existing device (#7921)

* Raise error when torch.load a storage on a non-existing device

Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine
would raise an unreadable error:

```
~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self)
    223         if self.idx is -1:
    224             return
--> 225         self.prev_idx = torch._C._cuda_getDevice()
    226         if self.prev_idx != self.idx:
    227             torch._C._cuda_setDevice(self.idx)

AttributeError: module 'torch._C' has no attribute '_cuda_getDevice'
```

This PR makes it so that torch.load raises a hard error if one tries to
load a storage onto a non-existing device and suggests the user to use
torch.load's map_location feature.

* Address comments

* missing dep

* Make THStorage / THCStorage have void* data ptr. (#7964)

* Make THStorage / THCStorage have void* data ptr.

This is the initial step in unifying the ATen and TH tensor representations, next is to only generate a single THStorage / THCStorage type.

The major changes here are:
1) data has been renamed to data_ptr and made void* in THStorage/THCStorage.
2) THStorage / THCStorage stores a at::ScalarType representing its data type (This will be useful when we generate a single THStorage/THCStorage).
3) APIs for Accessing the data as a real*:
a) storage->data<real>() -- this does runtime-type checking (checks that the at::ScalarType is correct).
b) storage->unsafeData<real>() -- as above, but no runtime-type checking (used in inner loops / fast code paths).
c) THStorage_(data)(storage) -- this already existed, just calls storage->data<real>().

* Add include.

* Attempt to fix clang build issues.

* Clarify comment and remove extra character.

* Rename unsafeData -> unsafe_data.

* Remove unnecessary 'to' function to get compile time rather than link time errors.

* Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834)

* Import/export observer symbols for DLL, which fixes the linking error in Visual Studio.

* Add support of all default cmake build types for release to cuda.

* Remove python bindings for `torch.slice` (#7924)

* skip python bindings for slice

* remove tests

* convert slice test to indexing

* Build ONNX for PyTorch version of libcaffe2 (#7967)

* support loading gzip (#6490)

* support loading gzip

* address comments

* address comments

* fix lint

* fix test for python2

* Add memory leak check in CUDA tests (#7270)

* Add memory leak check in CUDA tests

* Tracking multi-GPU too

* fix run_test.py not running __name__ == '__main__' content; add test for make_cuda_memory_checked_test

* add a comment

* skip if cuda

* 1. Change the wrapper to a method in common.py:TestCase
2. Refactor common constants/method that initialize CUDA context into common_cuda.py
3. Update some test files to use TEST_CUDA and TEST_MULTIGPU

* Fix MaxUnpool3d forward memory leak

* Fix MultiLabelMarginCriterion forward memory leak

* Fix MultiMarginLoss backward memory leak

* default doCUDAMemoryCheck to False

* make the wrapper skip-able

* use TEST_MULTIGPU

* add align_corners=True/False tests for Upsample; fix TEST_CUDNN

* finalize interface

* VolumetricMaxUnpooling_updateOutput

* fix test_nccl

* rename THC caching allocator methods to be clearer

* make the wrapped function a method

* address comments; revert changes to aten/src/THC/THCCachingAllocator.cpp

* fix renamed var

* Revert "Set smaller grain size for some cases" (#7988)

* Entry for c10d in CODEOWNERS (#8001)

* Fix a couple of typos (#7998)

* Fix typo

* Fix typo

* Fix typo

* Fix typo

*  Add on-stack observer cache for Observable (#7931)

observers_list_ stores all the observers for an observable. The list is allocated on heap, which
 can cause LLC miss. Add an on-stack observer cache for fast access. In production, we have seen 20%
 speed up for start and stop observer calls.

* Reduce grain size for Unary operations (#8003)

* [auto] Update onnx to 8ec0e5f - Add index check for Transpose's type inference function (onnx/onnx#1053)
8ec0e5fe9b

* Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. (#7935)

* Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace.

This requires renaming the _cast functions which used the unqualified names.

* Separate onnx mapping of scalar type from cast name.

* Fix flake8.

* Properly cast onnx.

* Remove WITH_ROCM cmake flag/variable (use USE_ROCM solely) (#8013)

* Mention the pytorch-ci-hud on the README. (#8004)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Re-enable build env check (#7969)

* Re-enable build env check

* Fix linux test error

* Try to fix macOS test error

* Update nn.rst (#8029)

* Example for Transformed Distribution (#8011)

* [auto] Update onnx to 33e9cd4 - Remove the usage of default value to fix invalid proto3 files. (onnx/onnx#1052)
33e9cd4182

* [auto] Update onnx to 1504a33 - Convert schema assert for duplicate type names to exception (onnx/onnx#1057)
1504a33abb

* Support CUDA tensors in ProcessGroupGloo  (#7694)

This adds an unconditional dependency on CUDA, which is not desirable
for the long term. Ideally we have split like ATen where we have
different artifacts for different backends so you can decide at runtime
what to use.

* [auto] Update onnx to 3fb9656 - Fix for fbcode CI (onnx/onnx#1062)
3fb965666e

* propagate nan in some activations (#8033)

* propagate nan in some activations

* fix py2 not having math.nan

* flake8

* Fix profiler crash when no events register (#8034)

* Fix profiler crash when no events register

When trying to profile, attempting to print the event table throws a vague error because the event list is empty:

....
max_name_length = max(len(evt.key) for evt in events)
ValueError: max() arg is an empty sequence

This change fixes the error by returning an empty string.

* Update profiler.py

* Allow CI testing with different AVX configs (#8020)

* Allow CI testing with different AVX configs

* Unset ATEN_DISABLE_AVX and ATEN_DISABLE_AVX2 in default config

* Support for generating ATen during the fbcode build, rather than committing the generated files (#8002)

Paint the internal bikeshed a slightly different color to appease Buck tooling.

* Factor python dependency out of interpreter (#7970)

* Factor python dependency out of interpreter

* Remove NO_PYTHON for the autograd engine

If there is no python bindings, then a default Engine is constructed
the first time it is requested.

If the python libraries are loaded, then they override the default
accessor and the default engine becomes a python Engine.

Note: it is possible for two engines to be generated if a non-python
one gets created before the python bindings are loaded. This case
is rare, and just results in additional threads being spawned.

* Fixing AlexNet test which is skipped in CI

* [auto] Update onnx to 760c928 - add missing hasNInputShapes check for bidirectionalBroadcastShapeInference (onnx/onnx#1060)
760c9283d0

* Support modules that output scalar in Gather (and data parallel) (#7973)

* Support modules that output scalar in Gather (and data parallel)

* Improve warning msg

* [auto] Update onnx to 9e7855d - Remove PyTorch generated Upsample tests cases (onnx/onnx#1064)
9e7855dcd4

* [script] Add support for torch.zeros, torch.ones, etc. (#7799)

* [script] Add support for torch.zeros, torch.ones, etc.

* modifies gen_jit_dispatch to creating bindings for functions that do
  not take tensor arguments, but do have an initial type argument
* adds tensor attributes to these functions for device, layout, and
  dtype specification
* extends the list of valid compiler constants to include device, layout,
  and dtype.
* allows functions with Generators, but only using the default generator

Known limitations:
* when using `torch.float`, we convert it to a scalar tensor and make
  no checks that it is actually used only in a dtype specification.
  This is similar to how we handle Python numbers, creating some situations
  where the script is more permissive. Fixing this requires much more
  significant changes to the IR, so is lower priority for now.
* devices specified using string literals e.g. 'cuda:1' do not work,
  since we do not support string literals in general.

* Add profiling annotations to NeuralNet[Operator|Data] (#8005)

* Update from facebook 1ee4edd286a3 (#8040)

* Adding instance weight to batch distill loss

as title

* add bfloat 16-31

added bfloat 16-31 and their respective unit tests

* [CUDA9] Upgrade - fbcode

CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan").

This diff can only be committed if:
1. CUDA 9 rpm is rolled out fleet-wide (TBD)
2. NVidia driver 390.40 is rolled out fleet-wide (done)
3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done)
4. Make sure all dependents are built (done)
5. Test all C2 operators, PyTorch (see test plan)

* Share intermediate int32 buffer across Conv ops

Adding a known type

* [C2 fix] infer function for ensure_cpu_output_op

this is adding the missing device funtion for ensure_cpu_output_op

* [int8] Add blob serializer/deserializer for Int8TensorCPU

To export to logfiledb

* [nomnigraph] Add try catch block to optimization passes in predictor

This will catch failures that happen in the optimization pass.

* Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE

CAFFE_ENFORCE uses strack trace fetcher. Which is currently a
global static variable. If at static initialization time CAFFE_ENFORCE
is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init
functions registration, so we started to see this.

Meyers singleton is going to provide safety here. If stacktrace
fetcher was not registered yet, it will just use a dummy one.

* NUMA support in SparseNN CPU benchmark

Adding support for NUMA in SparseNN CPU benchmark

* [mobile-roofline] Add logging needed for roofline model

This should be all that's needed

* Let the operators using the same input if the operators are not chained

or else, we have to change the input data dims

* fix null-pointer-use UBSAN errors in in reshape_op.h

* revert previous fix on input blob name

as title

* Adding flag to let MineHardNegative automatically extract single value from dict

Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case.

* Reverting change that broke internal tests back to OSS compatible state

* Skip CUDA memory leak test on BN tests on windows (#8043)

* workaround for Sequential when one cannot retrieve python source (#8048)

* [auto] Update onnx to 0dbec2a - - Generate protoc type hints on Windows (onnx/onnx#1047)
0dbec2a047

* [auto] Update onnx to 4f8ef17 - Remove erroneous documentation around maps and sequences. (onnx/onnx#1069)
4f8ef17ad3

* [auto] Update onnx to e6a500e - Extract constant to initializer (onnx/onnx#1050)
e6a500e54c

* [auto] Update onnx to 033f956 - make gcc happy (onnx/onnx#1061)
033f956f41

* Remove NO_PYTHON macros from Exceptions.h/cpp (#8007)

Removes cases where NO_PYTHON was unnecessary in Exception.h/cpp

* [ready] Clean up torch.distributions (#8046)

* Have a single THStorage and THCStorage type. (#8030)

No longer generate data-type specific Storage types, since all Storage types are now identical anyway.
For (some) backwards compatibility and documentation purposes, the Real names, e.g. THLongStorage are now #defined as aliases to the single THStorage type

* Reduce usages of TensorUtils<T>::DataType in THC. (#8056)

TensorUtils<T> is basically ATen-dispatch-lite in that it allows one to do multi-type THC function dispatch with a single call.
However, it is templatized on the Tensor type, and since we are moving to a single Tensor type, this doesn't work.

Most of the functions in TensorUtils (e.g. getDims) can be pulled up a level, to just call THCTensor_nDimension (or directly accessing the member),
but the DataType specific functions are more problematic.

So, this PR does two things:
1) Replaces calls of 'TensorUtils<THCTensor>::DataType' with 'real' since these are identical
2) Templatizes the THC_pointwiseApplyX functions to take scalar types.  To ensure this is done correctly, we static_assert that the scalar type template parameter matches the scalar type of
   the corresponding template parameter.  We will need to get rid of these static_asserts in the future, but this is useful for now.

* Support to run ONNX Upsample operator (mode=nearest) in Caffe2 (#8037)

* Added support to run ONNX Upsample operator (mode=nearest) in Caffe2

* adding error checks to upsample

* adding error checks to upsample

* adding error checks to upsample

* changing to np.isclose

* Revert onnx submodule update

* still fixing

* [auto] Update onnx to eb12f72 - Add conv transpose test cases (onnx/onnx#886)
eb12f72a86

* [auto] Update onnx to bd98abb - Add a hook for doing post-processing on protobuf generated header files (onnx/onnx#1068)
bd98abbba0

* Skip ConvTraspose ONNX backend tests (#8074)

* Post process onnx proto (#8064)

* Post processing onnx generated protobuf files to hide global symbols

* .

* .

* Add code for TensorBoard visualization of JIT GraphExecutors (#8050)

* [auto] Update onnx to cc26486 - bump version to 7 for prelu. (onnx/onnx#1063)
cc26486541

* [auto] Update onnx to 356208d - add input tensor dimension checks to shape inference (onnx/onnx#1070)
356208d756

* Move backtrace to its own header (#8096)

* Move backtrace to its own header

* Move cxxabi.h into Backtrace.cpp

* Fix and ignore some warnings (#8081)

* Do an additional sanity check that nvcc and CUDA include dir agree. (#8094)

If you set CUDA_HOME and CUDA_NVCC_EXECUTABLE together, you may
end up in a situation where the CUDA_VERSION of your includes
mismatches the CUDA version of your nvcc.  See #8092 for a concrete
case where this can occur.  Explicitly detect this situation and
give a good error message in this case!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* use regex in kwarg parser (#8061)

* Removing remaining NO_PYTHON ifdefs (#8067)

* Remove NO_PYTHON in tracing

* Remove NO_PYTHON in ir.h

* Remove NO_PYTHON in test_jit.cpp

* Replace std::size_t with size_t (#8093)

* Remove out-of-date comment (#8114)

* [Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Resolve merge conflicts

* .

* Update GetAsyncNetHIPThreadPool

* Enable BUILD_CAFFE2 in pytorch build

* Unifiy USE_HIP and USE_ROCM

* always check USE_ROCM

* .

* remove unrelated change

* move all core hip files to separate subdirectory

* .

* .

* recurse glob core directory

* .

* correct include

* .

* Detect CUDNN related environment variables in cmake (#8082)

* Implement adaptive softmax (#5287)

* Implement adaptive softmax

* fix test for python 2

* add return_logprob flag

* add a test for cross-entropy path

* address review comments

* Fix docs

* pytorch 0.4 fixes

* address review comments

* don't use no_grad when computing log-probs

* add predict method

* add test for predict

* change methods order

* get rid of hardcoded int values

* Add an optional bias term to the head of AdaptiveSoftmax

* Make libshm also test if rt requires pthread. (#8112)

In some configurations (e.g., our internal build of GCC 5 + GLIBC 2.23),
-lrt is not sufficient to use shm_open; you also need to declare
a dependency on pthread.  This patch adds a surgical extra fix to
detect this situation, in the case that I noticed it failing in the
wild.

Fixes #8110

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* [auto] Update onnx to 2d5ce4a - Remove empty model (onnx/onnx#1058)
2d5ce4aeb6

* Add missing pragma once. (#8118)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* [auto] Update onnx to 2a87616 - Tests for LRN operator (onnx/onnx#903)
2a876162ac

* Split SparseTensorImpl off from TensorImpl. (#7990)

* Split SparseTensorImpl off from TensorImpl.

At the moment they have the same data layout, but with the upcoming refactor
they will not, and we need a place to put all of the sparse tensor specific
fields.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Update SparseTensorImpl.h

* [Caffe2] Support non peer access in muji and fix bug when reduced_affix is empty (#6896)

* [Caffe2] Support non peer access in muji

* [Caffe2] Add test for 4 gpus and 2 groups

* [Caffe2] Add comments

* Fix bug when reduced_affix is empty

* Fix typo and add comments about cpu and amd gpu

* Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127)

* Replace most remaining usages of TensorUtils<T>::DataType. (#8124)

As in https://github.com/pytorch/pytorch/pull/8056, this doesn't work with a single TensorImpl type.
This replaces the usages of with a templatized parameter and static_asserts that the new and old are equal.

After this we can get rid of the old template parameter, but I want to ensure they are equivalent across all builds first.

* Add utf-8 header to Python file with Unicode. (#8131)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add back lrn test (#8134)

* Revert "Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127)"

This reverts commit 410191c4175eaae141306cdb3c3c1c1e8a495225.

* Fix mismatched default values

* Add non_blocking to Tensor/Module.to (#7312)

* Add non_blocking to Tensor/Module.to

* flake8

* Add argparse tests

* cpp parse

* Use C++ parser

* use a commong parse function with Tensor.to

* fix test_jit

* use THPObjectPtr

* increase refcount for None, True, and False

* address comments

* address comments

* Fix job name checking for AVX tests (#8135)

* Fix a corner case for ReShapeOp (#8142)

In my use case, in the backward propogate pass, the reshape need to
change a [0] tensor into [0,0] shaped tensor. The original implementation would
cause out of index issue. This diff fix this problem.

* cpu/ideep context converter (#8139)

* fix type mismatch while call torch._C._cuda_setDevice (#8065)

* fix type mismatch while call torch._C._cuda_setDevice

* fix type mismatch in scatter

* fix type mismatch in scatter

* fix type mismatch while call torch._C._cuda_setDevice

* fix type mismatch while call torch._C._cuda_setDevice

* fix type mismatch while call torch._C._cuda_setDevice

* docs: Add warning to torch.repeat() (#8116)

* docs: Add warning to torch.repeat()

closes #7993

* docs: Add links for numpy functions

* docs: Break the too long line

* Accelerate bernoulli number generation on CPU  (#7171)

* opt bernoulli rng with vsl and openmp

* detect cpu vendor for bernnoulli

* retrigger test platform

*  check the vendor more severely

* use cpuinfo to check vendor

* docs: add canonical_url and fix redirect link (#8155)

* docs: enable redirect link to work for each specific page

* docs: add canonical_url for search engines

closes #7222

* docs: update redirect link to canonical_url

* docstring support for @script and @script_method (#7898)

* docstring support for @script and @script_method

* make it python2 compatible

* improve according to review

* improve build_stmts

* use filter instead of list comprehension

* improve the way wrap is handled for script_method

* stash the original method instead

* allow dynamic attr for ScriptMethod and GraphExecutor

* a bit comment on build_Expr

* remove _build_wrap

* a bit improve on comments

* rename to __original_methods

* should be _original_methods

* [auto] Update onnx to 968d28d - fix Node::isBefore (onnx/onnx#1075)
968d28d901

* remove some unnecessary cudaGetDevices (#8089)

* remove unnecessary cudaGetDevices

* make curDevice argument non-optional, add explicit checks to current_device

* Fix cuda.framework error on OSX. (#8136)

When compiling OSX with CUDA, Caffe2's build system uses
find_package(cuda) to get its grubby hands on the CUDA driver
library (for some strange reason, FindCUDA doesn't save this
information as a variable).  Unfortunately, on OSX, sometimes
this picks up the cuda.framework folder, and then our build
system chokes to death because it doesn't try to link against
this as a framework.  (Is the folder even a framework?  I have
no idea).

This commit attempts to fix this in a two pronged fashion:

1. For some users, reducing the precedence of frameworks
using CMAKE_FIND_FRAMEWORK seems to help.  So we set these
variables.  However, this fix is not perfect; on my laptop
it doesn't actually solve the problem.

2. PyTorch doesn't actually need the CUDA driver API.  So we
only add the dep when building Caffe2.

Fixes #8022

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* [C++ API] Improve and use OrderedDict for parameters / modules (#7823)

* Improve OrderedDict for C++ API

* Give OrderedDict a subject and fix review comments

* Fix OrderedDict use in torch/csrc/jit/script/init.cpp

* Fix __rshift__ bug (#8161)

* Fix __rshift__ bug

* Add small tests for __lshift__ and __rshift__ in test_cuda

* Add a more elaborate check for __lshift__ and __rshift__

* refactor the test to address @zou3519 's comments

* Move non-generic Storage code needed by TensorUtils to non-generic C++. (#8164)

For non-generic function call implementations in Storage used by TensorUtils, we do the following:
1) Move the declaration from generic/C to non-generic/C++; we don't need backwards compatibility on these functions and want to use e.g. at::ScalarType.
2) Move the implementation from generic/C++ to non-generic/C++.
3) Change the generic implementation to call the non-generic implementation.

This will allow us to get rid of the corresponding TensorUtils calls (once we move over the Tensor functions in the same manner).

* Pinning opencv to < 3.4 in conda builds (#7923)

* Pinning opencv to 3.1.0 in conda builds

* Also pinning numpy to 1.11

* Trying only specifying <3.4

* Adding -setup- path, and better code structure (#8122)

* Abstract parallelization to faciliate using threadpools (#8163)

* [Caffe2] Update elementwise ops to support numpy style boradcast (#8070)

* Update elementwise ops to support numpy style boradcast

Update elementwise ops to support numpy style boradcast

* Fix sqrt_op

* Fix compare ops

* Fix gradient test

* Fix optimizer legacy broadcast

* Fix legacy broadcast for elementwise ops

* Skip flaky test

* Fix eigen simple binary op

* Fix attention test

* Fix rnn test

* Fix LSTM test

* Fix tan grad

* Fix schema check

* Export getCudnnHandle (#7726)

* [JIT] Support a single TensorList argument anywhere in the argument list + index_put (#8173)

* [JIT] Support a single TensorList argument anywhere in the argument list

* [JIT] index_put

* use the correct datatype format (#8144)

* Add back onnx console scripts dropped during migration from onnx-caffe2 (#8143)

* Get rid of SOVERSION (again). (#8132)

We don't want SOVERSION because pip will lose the symlink and
double your distribution size, and also because our setup.py
accidentally links against both libcaffe2.dylib and libcaffe2.1.dylib
on OS X.  This leads to a very puzzling error where you get
the error "cannot initialize CUDA without ATen_cuda", because
there are actually two copies of your registry in memory (because
there are two copies of the dynamic library).  Dropping SOVERSION
makes it impossible to make this mistake.

In principle, if the shared library load is done with DYLD_GLOBAL,
that should also prevent two copies of the registry from popping up.
Worth checking at some later point, if you need to bring back
SOVERSION (because, e.g., pip finally fixed their software.)

Partially fixes #8022.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix a corner case for ReShapeOp (#8178)

In my use case, in the backward propogate pass, the reshape need to
change a [0] tensor into [0,0] shaped tensor. The original implementation would
cause out of index issue. This diff fix this problem.

* Better conv error message basing on weight shape (#8051)

* Add retry logic to sccache download for Windows build (#7697)

* Add retry logic to sccache download for Windows build

* fix script bug

* clean up

* fix caffe2 docker build (#7411)

* [ONNX] Fix type_as symbolic (#8183)

* [ONNX] Nuke type_as symbolic

* make it better

* Fix lookup + test

* Yangqing as an ONNX codeowner (#8185)

* Fix protobuf options (#8184)

* protobuf

* fix protobuf_MSVC_STATIC_RUNTIME

* Add a loop unrolling pass to PyTorch JIT (#7672)

* [auto] Update onnx to 4e65fd8 - fuse consecutive squeezes (onnx/onnx#1078)
4e65fd83ba

* [Caffe2] Merging setup.py with setup_caffe2.py (#8129)

* Mergine setup.pys, torch works, caffe2 works up to other KP

* Fix to super call for python 2

* Works on python2 on mac

* Consolidating Caffe2 flags

* Fix scalar check for sparse tensors. (#8197)

* Fix scalar check for sparse tensors.

As discovered in #8152

If `t` is a scalar sparse tensor, `t._indices` used to return a sparse
empty tensor because the scalar check was incorrect. This PR modifies
the scalar check to return a dense tensor instead of a sparse tensor.

i.e.
```
tensor = torch.sparse_coo_tensor([], [], torch.Size([]), device=device)
out = tensor._indices()  # was a sparse tensor, now is dense.
```

* Fix typos

* fix lint

* Add more annotations for arguments in ATen schema (#8192)

* use THCThrustAllocator in BCECriterion (#8188)

* Allow parallel_apply to take in list[Tensor] (#8047)

* Docs for gradcheck and gradgradcheck; expose gradgradcheck (#8166)

* Docs for gradcheck and gradgradcheck; expose gradgradcheck

* address comments

* Implement randperm for CUDA (#7606)

* Implement randperm for CUDA

* Use Thrust to implement randperm

* clean up

* Fix test

* Offload small input scenario to CPU

* Fixed test

* Try to fix Windows error

* Fix Windows error and clean up

* Use fork_rng context manager

* Move test_randperm_cuda to test_cuda

* Add half tensor support

* Fix cuda::type error

* Fix CPU offloading

* Fix issues

* No need to check range for n == 0 case

* Update c10d build to link against Caffe2 (#8201)

This follows #7399.

* add wipe_cache option (#8204)

as title

* Replace (non-data) TensorUtils calls with non-generic THCTensor calls. (#8176)

* Replace (non-data) TensorUtils calls with non-generic THCTensor calls.

TensorUtils is templatized on the THTensor type, so to support a single tensor type (like ATen), we need to remove these.

This PR does the following:
1) Allows THCTensorTypeUtils.cuh to include THCTensor.hpp.
   This involves moving includes of it outside of generic/, so we can use the new implementations.
2) Defines a single _THTensor struct and changes THCRealTensor to be a derived type of _THCTensor.
   This allows us to implement a single non-generic function and avoid static_cast or void * tricks to call it from the generic functions.
3) For functions inside of TensorUtils that don't use data pointers:
   a) Implement the functions in (non-generic) THTensor.cpp and declare them in (non-generic) THTensor.hpp.
   b) Have the generic versions call the non-generic versions.
   c) Replace the corresponding TensorUtils<THCTensor>::fn call with (non-generic) THTensor_fn.

* Add comment about THCTensor struct.

* Error if storage is null in setStorageNd or resizeNd.

* Fix c10d compiler warnings (#8206)

Copy compiler flags from the ones used in setup.py and fix warnings.
This makes the root build that includes c10d headers warning free.

* Bump gloo submodule (#8202)

This includes facebookincubator/gloo#125.

* rm -rf aten/contrib (#8165)

* Remove aten/contrib

* Remove from CMake

* Fix tanh_op on ios build (#8207)

* Fix tanh_op on ios build

* Fix tanh

* [auto] Update onnx to f28e2f1 - fix lrn spec (onnx/onnx#1090)
f28e2f1a60

* [cmake] deprecate caffe2_* specific cuda function in cmake. (#8200)

* deprecate caffe2_* specific cuda function in cmake.

* ENV{} -> $ENV{}

* CUDA_ARCH_NAME -> TORCH_CUDA_ARCH_LIST

* .

* .

* .

* skip CUDA memory leak check on Windows altogether (#8213)

* Record shape and type in autograd to validate gradients (#8168)

The check that the gradient is defined is currently disabled because
TestJit.test_ge_optimized will trigger the error.

* [auto] Update onnx to 18d70ff - Graph should only have one (input) kParam node (onnx/onnx#1088)
18d70ff529

* Set up a c10 source folder (#7822)

* Set up a c10 source folder

* Change the benchmark log format and also log flops (#8215)

as title

* Move helper functions to unnamed namespace. (#8224)

Currently, the helper functions in this file are in global
namespace. I am guessing the purpose of excluding them from was to
keep them local.

* [auto] Update onnx to e96d823 - Update Google benchmark to 1.4.1 (onnx/onnx#1083)
e96d823e5c

* Change new bernoulli implementation to be fully generic. (#8218)

The current implementation depends on THTensor types being unique, which is not guaranteed going forward.

* Structure THTensor like THCTensor is structured. (#8217)

In particular, define a base type, _THTensor, that can be used for all THRealTensor structs.
This is just to have less cognitive load when dealing with generic THTensor/THCTensor types (as in templates).

* move THCP-related utils to cuda/utils.cpp. (#8221)

These files don't follow the usual pattern: In general the files torch/csrc/X torch/csrc/cuda/X
both include the generic file torch/csrc/generic/X, where torch/csrc/X includes the cpu implementations and torch/csrc/cuda/X includes the cuda implementations.
(Aside: this is probably not the best structure, the torch/csrc/X fiels should probably be moved to torch/csrc/cpu/X).

utils.cpp combines these so that torch/csrc/utils.cpp has cuda specific code.  This makes it impossible to declare a single THTensor and THCTensor template type (i.e. THPPointer<_THTensor>, THPointer<_THCTensor>).

* [READY TO MERGE] Use ccache in macOS build (#8009)

* Use ccache in macOS build

* Moving to sccache

* Don't use sccache in test job

* [NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647)

* Add nan and inf probs check to multinomial

* fix bug

* Spawn CUDA test in subprocess

* Make sure invalid input won't pass the test case

* Try to fix error

* Test failure cases in Python 3 only

* Try to fix Windows error

* Move CUDA test to test_cuda.py

* fix issues

* fix module name error

* no need to check for CUDA existence in test_cuda

* Use PY3

* [READY TO MERGE] Enable tests that use DataLoader with multiple workers on Windows (#6745)

* Don't import TEST_CUDA for test_dataloader on Windows

* test_partial_workers is stuck on Windows

* Don't copy unneeded grads when using a function for several derivatives (Fixes #7722) (#7759)

Trying to copy all results fails when one of them is a tensor list which
has not been populated. This blew up for CuDNN RNNs when the weights
did not require grad.

Thanks to Sylvain Gugger for reporting!

* Fix win mkldnn (#7718)

* Sync build_pytorch_libs.bat with build_pytorch_libs.sh

* fix quoting

* add warnings

* fix warnings

* Add /EHa

* [Caffe2] Add ADD operator for IDEEP (#8220)

* Add ADD operator for IDEEP

* Add boradcast check

* Comments

* Allow optional build and installation of native test binaries (#8225)

* test finetuning

* install off by default

* Turn BUILD_TEST=ON for jenkins.

* Turn on install_test in jenkins as well

* Update MKL exporter to IDEEP ops (#8228)

IDEEP exporter support

* [ideep] Add IDEEP Squeeze op (#8227)

Similar to MKLSqueezeOp at caffe2/mkl/operators/squeeze_op.cc

* [auto] Update onnx to 62e63e9 - Fix build errors inside protobuf-bench (onnx/onnx#1084)
62e63e9de8

* Use .cc since some downstream libraries are configured for C++ only. (#8234)

* Rename SparseTensor to SparseTensorRef. (#8237)

I want to introduce using SparseTensor = Tensor (as a documentary
type alias for Tensor), but the name is already taken.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* [caffe2] Build Android tests and binaries in CI (#7593)

Update benchmark submodule to version with fixed Android/GNUSTL build

* Remove core and util warnings (#8239)

* Fix some signed/unsigned mismatches

* Skip unused result warning

* Explict fallthrough for murmur hash

* Enable aligned new support to eliminate warning

* Switch to int instead of unsigned in some cases

* Remove .gitmodules.aten since it is in .gitmodules now (#8232)

* Fix: gradcheck forced float32 (#8230)

* Print requires_grad and grad_fn in string repr of tensor (#8211)

For example:

  >>> torch.ones(3).requires_grad_()
  tensor([ 1.,  1.,  1.], requires_grad=True)

  >>> torch.ones(3).requires_grad_() * 5
  tensor([ 5.,  5.,  5.], grad_fn=<MulBackward0>)

The suffix (dtype, requires_grad, grad_fn) wraps to a new line if
it would cause the the line to exceed the linewidth.

  >>> torch.ones(10).double().requires_grad_()
  tensor([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
         dtype=torch.float64, requires_grad=True)

* Fix TEST_CUDA import in test_cuda (#8246)

* Fix lifting cat into its constant version (#8174)

This fixes a bug where schema including varargs lists did not lift
properly blocking correct ONNX export.

* Don't override Tensor, Storage macros defined outside torch/csrc in t… (#8243)

* Don't override Tensor, Storage macros defined outside torch/csrc in torch/csrc.

This PR does the following:
1) Removes THSTensor macros in torch/csrc, which aren't used.
2) For macros defined outside of torch/csrc (THTensor, THTensor_, THStorage, THStorage_):
a) No longer override them, i.e. previously THTensor could actually be THCTensor if a generic file was included from a file including THCP.h.
b) Instead, introduce new macros THW* (e.g. THWTensor) to represent a (potentially empty) wildcard character.

In addition to making this code easier to read and codemod, this allows us to more freely change TH/THC; for example:
currently in the THC random code, the state is casted to THByteTensor*; this happens to work because the macros don't happen to override THByteTensor.
But if THByteTensor just becomes an alias of THTensor (which is the plan for a single tensor type), then this no longer works.
The whole thing is a bit of a mess previously because you really have to understand which macros and redefined and which aren't.

We could also rename the macros that live in torch/csrc (e.g. the THPTensor macros), but since that is more self contained, I punted for now.

* Don't change the plugin.

* [auto] Update onnx to 3a035f4 - Add retry logic to model downloading (onnx/onnx#1077)
3a035f4397

* Fully genericize THC/THCUNN (except for TensorUtils and DeviceTensorUtils). (#8251)

* [cmake] Use CAFFE2_USE_* for public/cuda.cmake (#8248)

* Fix app size check (#8256)

Fix app size check

* wip on CPU impl

* Stop BCELoss from returning negative results (#8147)

* Stop BCELoss from returning negative results

* check explicitly for 0 before taking log

* add tests

* fix lint

* address comments

* Relax CUDA_HOME detection logic, to build when libraries are found. (#8244)

Log when no cuda runtime is found, but CUDA is found

* Added backward function for kl_div target (#7839)

* added backward fn for target

* added module test for kl_div target, and assuming targets are probabilities

* Change the output format of caffe2 observers (#8261)

as title

* Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. (#8247)

* Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor.

* Fix template parameter.

* [caffe2] Move submodule onnx-tensorrt forward (#7659)

Commit 82106f833dcb0070446a150e658e60ca9428f89b is essential.

* [ideep] Add IDEEP fallbacks for Faster-RCNN ops (#8260)

TSIA

* un-genericize THCDeviceTensorUtils. (#8258)

* provide data<T>() in TH(C)Tensor.

* un-genericize THCDeviceTensorUtils.

This is used outside of generic context, so we need to un-genericize it to have a single THCTensor type.

* [caffe2] Fix ATen dispatch for ops with TensorList arg (#8226)

* [cmake] Add and export Modules_CUDA_fix (#8271)

* Add and export Modules_CUDA_fix

* actually, need to include before finding cuda

* [auto] Update onnx to 2508156 - Make error message more verbose (onnx/onnx#1097)
2508156135

* [auto] Update onnx to 39e4668 - fix optimizer does not set ir_version bug (onnx/onnx#1098)
39e46687ea

* [cmake] Make cudnn optional (#8265)

* Make cudnn optional

* Remove cudnn file from cpu file

* Move signal window functions to ATen; add Blackman window (#8130)

* Move signal window functions to ATen; add Blackman window

* fix cuda test not checking scipy

* [ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233)

IDEEP supports fusion for non-group conv

* [c10d] NCCL Process Group implementation (#8182)

* [c10d] Process Group NCCL implementation

* Addressed comments

* Added one missing return and clang format again

* Use cmake/Modules for everything and fix gloo build

* Fixed compiler warnings

* Deleted duplicated FindNCCL

* Set up CI build for CUDA 9.2 + macOS (#8274)

* Add macOS CUDA build to CI

* Fix undefined symbols issue

* Use sccache for CUDA build

* Fix sccache issues

* clean up

* c10 build setup (#8264)

* Move c10/ to caffe2/dispatch/

* Set up caffe2/utils directory

* Remove remaining TensorTypeUtils functions. (#8286)

Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType.

* Create initial Python bindings for c10d (#8119)

* Build and install c10d from tools/build_pytorch_libs.sh

* Create initial Python bindings for c10d

* clang-format

* Switch link order to include more symbols

* Add bindings and tests for ProcessGroupGloo

* Add broadcast test

* Separate build flag for c10d

* Explicit PIC property

* Skip c10d tests if not available

* Remove c10d from Windows blacklist

Let it skip by itself because it won't be available anyway.

* Make lint happy

* Comments

* Move c10d module into torch.distributed

* Close tempfile such that it is deleted

* Add option USE_NVRTC which defaults to off (#8289)

* [build] Remove /torch/lib/THD/cmake in favor of /cmake (#7159)

* Remove /torch/lib/THD/cmake in favor of /cmake

* path fix

* Explicitly marking gloo to use cuda

* Fix gloo path in THD

* Have a single THTensor / THCTensor type. (#8288)

* Remove remaining TensorTypeUtils functions.

Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType.

* Have a single THTensor / THCTensor type.

As was previously done with Storages, have only a single (dtype-independent) THTensor / THCTensor.

For documentation and backwards compatibility purposes, the old names, e.g. TH(Cuda)LongTensor alias the new TH(C)Tensor type.

* undef GENERATE_SPARSE.

* [auto] Update onnx to 58efe0a - add float16 support back for math and reduction ops (onnx/onnx#1102)
58efe0a9ca

* Some utils for compile-time programming (#7778)

* Add some C++17 features, implemented with C++14

* Add some type traits

* Compile-time type list abstraction

* Some utils for compile-time programming

* Fix compatibility with a larger range of compilers

* Use guts::array instead of std::array because of std::array shortcomings

* code review comments

* Use quotes for includes

* Remove THC's FindMAGMA (#8299)

* Entries for torch.distributed in CODEOWNERS (#8293)

* Add depthwise convolution test for IDEEP (#8301)

* Fix dividing by zero segfault in Reshape (#8302)

when infer a dimension of zero size new shape

* Removes unused THCTensorConv (#8229)

* Replace Variables to Tensors (#8309)

* Clean up old sccache log before build (#8305)

* Remove unused grad ops on mobile to reduce app size (#8297)

Remove unused grad ops on mobile to reduce app size

* Small fixes (#8296)

* [auto] Update onnx to 5ed684e - Remove/replace /MX with /WX for MSVC build. Was typo in a previous ch… (onnx/onnx#1104)
5ed684ebe5

* Fix sample code for cuda stream (#8319)

* [auto] Update onnx to 4b4085c - Add missing warning ignoring flags to onnx_proto CMake target (onnx/onnx#1105)
4b4085c2e9

* [THD] fix broken THD build with NCCL (#8323)

* Add docstring for `torch.sparse_coo_tensor` (#8152)

* add sparse_coo_tensor docstring

* update empty tensor example

* whitespace

* whitespace again

* add error when backend is not supported by DDP (#8325)

* Fix collect_env.py for Windows (#8326)

* Fix collect_env.py for Windows

* Fix expect file for Win machine

* Fix the script doesn't stop eariler on error for MSVC and Ninja (#8277)

* Simplify the solution

* Remove the usage of set errorlevel

* Skip test_multinomial_invalid_probs_cuda on Windows (#8324)

* Support printing sparse tensors in ATen, fixes #8333. (#8334)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* [C++ API] Cursors (#8190)

* Add cursors to C++ API

* Small self nits

* s/struct/class

* Use more STL like names for cursors

* Implement dim_arange operator (#8266)

* Implement arange_like operator

* add ONNX symbolic

* lint

* change name

* Comment the hack

* 1. fixed flip CPU impl for non-continuous flip dims; 2. added more tests; 3. using TensorInfo and collapseDims to speed up CUDA impl for cases where flip dim is the 1st or last dim

* nits

* 1. removed for loop in pointwise CUDA kernel; 2. using templated (int64_t) IndexType for indices in pointwise CUDA kernel

* added torch.flip.__doc__

* nits
2018-06-15 21:20:55 -04:00
92f67d9404 fix lint 2018-06-15 18:18:20 -07:00
26bed6d83e assert limit on cudnn grid_sampler (#8576) 2018-06-15 17:33:33 -07:00
7b2ad8893d Eliminates noisy assert spew when running test_cuda.py (#8531)
* Fixes test_multinomial_invalid_probs_cuda debug spew

* Fixes test_multinomial_invalid_probs_cuda debug spew

* Fixes Python linting
2018-06-15 19:52:53 -04:00
682dec2cea add relu to jit and exp to autodiff (#8573) 2018-06-15 19:49:20 -04:00
b10c94b507 Update operator documentation with markdown descriptions and interfaces (#8085)
* Update operator documentation with markdown descriptions and interfaces

* Added rest of updated operator documentation to source files

* Commiting local changes for rebase

* fixed bracket typo in sqrt_op.cc file

* Added updated markdown documentation to remaining completed ops
2018-06-15 19:02:24 -04:00
d968614502 Enable open registration of VariableType objects (#8540)
We have 2 use cases where we want to experiment with new base ATen
tensor types:

* BatchTensor for matchbox
* Tensors that live on accelerators

It is possible to subclass TensorImpl to implement these but VariableType
does not work with them because it cannot find the equivalent variable type
in the registry.

This commit changes the way we implement type -> variable(type) lookup so that
torch::register_variable_type_for can be called on any at::Type.

Lookups are still done using arrays so there should be no perf impact from the change.
2018-06-15 14:56:19 -07:00
711e5a6ceb Port THS to ATen. (#8409)
* Port THS to ATen.

The basic structure of the patch:

- All kernels in aten/src/THS got rewritten as native
  functions in aten/src/ATen/native/sparse

  I took the liberty to rename some of the kernels,
  opting for a longer, more transparent names than
  things like 'spaddcmul'.

- Instead of holding fields for sparse tensor in the TH
  C struct THSTensor, they are now held in a C++ class
  SparseTensorImpl (this explains why I had to do this
  all in one go; I can't have *two* reps for sparse
  tensors!)

  Along the way, we change a key internal representation
  invariant: an "empty" sparse tensor has dimI == 1 and
  dimV == 0 (this is different from dimI == 0 and dimV == 0
  we had before); this ensures that we maintain the invariant
  that dim == dimI + dimV.  "Scalar" sparse tensors are
  made illegal, because there really is no way to properly
  express them in COO format.

- Because we haven't ported THCS or any of the traditional
  dense TH implementations, there is a new set of adapter
  functions in native/LegacyBridge.cpp exclusively devoted
  to deciding whether or not to go to the new native implementation
  or back to the legacy TH binding (prefixed with th_).
  The intent is that when everything gets ported, we can
  delete this file.

- I've kept the stubs for all the THS functions, but they now all
  error if you try to actually call them.  Eventually, we should
  replace these with calls to ATen so that everything keeps
  working.

- I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.

There are some miscellaneous improvements which were needed for other
changes in this patch:

- There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
  it says on the tin.

- axpy templated function moved to TH/BlasUtils.h, there's a new macro
  which lets you easily forward to all of the TH functions. We also expose
  THBlas_copy.  I'm not terribly pleased with these functions but
  they seem to serve a purpose they need.

- New method on Tensor to get TensorImpl*, unsafeGetTensorImpl

- accessor() is now this-const, since const-correctness on Tensor is a lie

- New toSparse()/toDense() methods on Type; now you can call these
  directly without having to manually apply at::toSparse/toDense
  on the Backend and then running toBackend yourself.

Changes to the kernels:

- Previously, the whole body of all kernels was compiled for
  every supported scalar type.  In our new implementation,
  the scalar dispatch has been pushed into the smallest extent
  which (1) is not in a type loop and (2) requires statically
  knowing the scalar type.  These sites all use
  AT_DISPATCH_ALL_TYPES.  I tried to use lambdas as much as
  possible, but sometimes it was not possible when a OpenMP
  pragma was used.

- Anywhere we tested if the nDimension of a tensor was zero,
  we replaced with a test that numel is zero.  Because, as we
  known, nDimension of zero-size tensors in TH is zero, and
  that's wrong wrong wrong (and not done this way in ATen).

Some subtleties:

- Places where previously fastget1d was used, I now use a
  TensorAccessor.  However, you have to be careful about grabbing
  the accessor, because sometimes you will be accessor'ing
  indices/values and they are empty, which means they will
  be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.)
  So, essentially, it is only safe to grab an accessor *after*
  you have checked that nnz != 0.  All of these shenanigans
  will go away when we properly support zero-size dimensions.

  A few places, we test for this case just by wrapping the loop
  in a conditional on nnz.  Some other places this is not so easy,
  so we instead short-circuit the function with a special case for
  when nnz == 0 (usually, these implementations are degenerate).

- There is a very subtle but important difference between
  _sparse_get_impl(self)->indices() and self._indices();
  the latter may return a view!  This is because nnz is
  not guaranteed to match the dimensions of indices/values;
  you can "truncate" a sparse tensor by setting the nnz.
  Actually, I think this is not a good idea and we should
  enforce a stronger invariant, but for this patch I slavishly
  adhere to the old ways, and as such I have to be very
  careful if I want to resize something, I had better use
  the former and not the latter.

- I had to reimplement broadcasting by hand (thus the s_
  and non-s_ functions in the sparse native files).  There
  is a very important distinction between foo_out and foo_,
  so it is important that the LegacyBridge function always
  call to the lower layer, and not try to avoid boilerplate
  by calling to another LegacyBridge function first.
  I did NOT put broadcasting in LegacyBridge (even though,
  ultimately, that's where it must live), because the th_
  functions which are invoked from LegacyBridge handle
  broadcasting themselves, and I don't want to broadcast
  twice.

- Sparse function MUST explicitly specify the Type they
  dispatch from, otherwise Variable wrapping/unwrapping will
  not work correctly.  If you use _get_sparse_impl, that is
  sufficient to levy this requirement.

- The "has native" tests in LegacyBridge.cpp are not 100%,
  because some of the functions are mixed dense-sparse functions,
  and so you can't just say, "Oh, if it's sparse and CPU, call
  the native sparse implementation."  This is handled on a
  case by case basis.  There is some especially complex
  logic for add(), which has dense-dense, sparse-sparse
  and dense-sparse implementations.

- I added some uses of SparseTensorRef in native_functions.yaml,
  but you will notice that these are all on native_* functions,
  and not the actual, top-level functions.  So the SparseTensorRef
  is purely documentary (helping you not call the wrong overload)
  but there is no magic; we do the wrapping ourselves the hard
  way. (This is in constrast to the TH binding code which is magical.)
  Except for _sparse_mask; _sparse_mask is magical.

- There is a raw_copy_sparse_ method, which is really my way of
  getting around the fact that copy_ has never been implemented
  for sparse tensors (even before this patch), but there IS a
  super secret, internal way of doing these copies that the THS
  code used, and which I needed to get my hands on when I did this
  port.  We should refactor so that either (a) copy_ does support
  sparse-sparse copy natively, or (b) we do this other ways.

- Irritatingly, I must explicitly resize_as_ before copy_ into
  a tensor.  This was not the case with THTensor_(copy) but I don't
  have any direct binding that doesn't have this requirement.

- For some reason, the sparse tensor constructor accepts a scalar
  tensor for the values tensor.  This is kind of weird because
  you always need an nnz-dimension.  However, the old code supported
  this and just expanded it into a 1D size 0 tensor; so we need some
  explicit code to do this.

There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise.  I added them all when I was debugging and was
loathe to remove them.

Some last mile fixes after this commit went into PR

- Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts).
- Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short.
- Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings
- Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing
- Update test_function's output
- Some last mile fixes for dispatch confusion in sparse_coo_tensor functions.
- New simplified regression test based on failures I saw in ONNX
- Increase tolerance on super resolution test
- More robust dynamic_type normalization, fixes ONNX bug.
  The dynamic_type situation is very delicate; probably need
  to stop having both Scalar and real.
- Make new_with_tensor_sparse more CUDA safe
- Note about CUDA-safety in SparseTensorImpl
- Rename dimI/dimV to sparseDims/denseDims.
- Make localScalar on SparseTensorImpl work.
- Make numel uniformly supported on all types, not just dense
  types
- Add tests for is_nonzero() method (which exercises localScalar)
- Disable constant JIT autogenerated tests, which are fragile and broken
  by this change, but being fixed in a parallel track.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-15 17:52:21 -04:00
c537fd7432 fix lint (#8567) 2018-06-15 17:34:39 -04:00
c457fc994d Adding pyyaml to Ubuntu and Centos docker images (#8490) 2018-06-15 13:55:48 -07:00
ec23ee67cf add order switch op to nomnigraph (#8436) 2018-06-15 10:07:41 -07:00
dc186cc9fe Remove NO_* and WITH_* across codebase, except in setup.py (#8555)
* remove legacy options from CMakeLists

* codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY

* cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA

* removed NO_* variables and hotpatch them only in setup.py

* fix lint
2018-06-15 12:29:48 -04:00
d7690742d5 Fix the formula of some norms (#8545) 2018-06-15 10:41:26 -04:00
b002aee0ff Disable verbose logging for PyTorch ROCm nightly builds. (#8517) 2018-06-15 09:14:03 -04:00
7251d70c5b fixed THD NO_CUDA (#8539) 2018-06-15 09:09:23 -04:00
0965e8e9e7 [auto] Update onnx to 0125af3 - Add node test for Dropout (onnx/onnx#1115)
0125af3204
2018-06-15 11:31:13 +00:00
4e3ada19cf [auto] Update onnx to d9fc1b1 - Add Node test for BatchNormalization (onnx/onnx#1117)
d9fc1b14aa
2018-06-15 08:36:29 +00:00
5a31f73611 [auto] Update onnx to b70ee6a - Make RNN/LSTM/GRU treatment of recurrent weights consistent (onnx/onnx#1103)
b70ee6a99b
2018-06-15 05:22:43 +00:00
677739cd1e Fix createZerosLike for scalars (#8537) 2018-06-14 20:51:14 -07:00
55de546146 [auto] Update onnx to c647994 - fix upper-bound for local-region in lrn test case (onnx/onnx#1095)
c6479945bb
2018-06-15 03:40:07 +00:00
a8bf30d7a5 caffe2 hip python binding (#8491)
* caffe2 hip python binding

* Change back onnx submodule
2018-06-14 19:56:56 -07:00
3a1265c739 [auto] Update onnx to 578a439 - Add Node Test for InstanceNormalization (onnx/onnx#1118)
578a439b63
2018-06-15 01:40:42 +00:00
829bcf3e9b Don't apply PR 12 to Thrust anymore. (#8542)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-14 21:39:21 -04:00
848873e1f6 Must run apt-get install as sudo. (#8454)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-14 21:32:42 -04:00
302408e6c2 Support BatchNormalization opset 7 (#8482) 2018-06-15 08:44:35 +08:00
54c456da68 Improve win-build.sh for Windows local build (#8493) 2018-06-14 17:11:59 -07:00
544605d3a9 [JIT] Remove TK_WHERE (#8536) 2018-06-14 16:46:08 -07:00
34c9d16ca1 [JIT] End-to-end example-based robustness testing for hybrid frontend (#8451)
* End-to-end example-based robustness testing for hybrid frontend

* delet this
2018-06-14 14:58:30 -07:00
6869a5f0fb Throw error on 0-length tensor slicing (#7775)
* throw error on 0-length tensor slicing

* return empty tensor instead of throwing error

* make 0 slice work for tuples also

* add tests

* move check to aten

* Address comments
2018-06-14 17:40:51 -04:00
edc3000963 Move empty size logic from ATen into TH/THC. (#8468)
* Move empty size logic from ATen into TH/THC.

The goal here is to unify the tensor representations; since the "majority" of the representation is in TH, we push the empty size ({0}) and empty stride ({1}) logic into TH.

This PR does the following:
1) Previously THTensor/THCTensor with dim_ == 0, size == nullptr, stride == nullptr are now dim_ == 1, size == {0}, stride == {1}.
2) The logic that previously implemented this at the ATen level (e.g. THLongStorageView STRIDE_EMPTY_TENSOR) is removed.
3) The above is pretty clean except for resize/resizeNd logic -- that is still called with nDimension == 0.  So, we rename these to resizeLegacy, resizeNdLegacy, map nDimension == 1
into the new regime, and will later write a empty-aware resize/resizeNd and move over the calls to resizeLegacy, resizeNdLegacy.
4) Also introduces some ifdefs that are just used for testing:
a) USE_TH_SCALAR: move scalar logic in TH
b) USE_TH_ZERO_SIZE_DIM: support arbitrary 0-sized dimensions, i.e {...,0,...}.
These are just used to write forward-looking correct code while call sites to _dim() (old TH nDimension) and resizeLegacy are updated.

* Get rid of noelem_to_empty.

* Use static_cast rather than C-style cast.

* Allocator size for empty tensors in THS/THCS.

* Add back THLongStorageView type Stride (TH and arg parsing has some magic that needs these to be nullptrs).
2018-06-14 16:56:52 -04:00
6287b80d67 [auto] Update onnx to 3ca20e6 - Remove obsolete installation doc. (onnx/onnx#1108)
3ca20e6993
2018-06-14 20:50:50 +00:00
ae55865a3b Migrated hardshrink() to ATen and deprecated nn.Hardshrink() (#8117)
* 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet

* optimized memory read/write

* 1. pass in lambd as scalar for CPU/CUDA_apply*; 2. removed tests for hardshrink at test_legacy_nn

* fixes test_utils

* 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda

* 1. printing lambd value; 2. default lambd=0.5 is still failing

* getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py

* cleaned up debug printf
2018-06-14 16:42:20 -04:00
2ab4c9dbec DEPRECATED -> AT_DEPRECATED (#8496) 2018-06-14 16:25:49 -04:00
c4194169a8 Temporary solution for having access to Python installation path. (#8487)
* Temporary solution for having access to the root path for python installations until Caffe2/PyTorch figure out the best way to build.

* Update build.sh

Increasing the verbosity of HIP errors.
2018-06-14 16:05:03 -04:00
2f25d1fbc1 Enable tracing and script autograd tests (#8145)
This commit turns autograd function/method tests into tests run inside of a trace, or directly written using
script. These tests have uncovered many bugs and limited functionality
in the trace/script pathway, and these failing parts of the tests
are disabled using new exclusion sets. The size of these sets will shrink
as the bugs are fixed.
2018-06-14 11:48:15 -07:00
aa2c79a125 Add ONLY_FOR_TEST device type into executor (#8461)
Add ONLY_FOR_TEST device type into executor to support some of the tests
2018-06-14 14:06:35 -04:00
467fc3c436 [READY TO MERGE] Improve docs for Multinomial and Categorical distributions (#8472)
* Improve docs for Multinomial and Categorical distributions

* more improvement

* more improvement
2018-06-14 12:47:35 -04:00
aed98067bf Pin correct clang version in macOS CI test (#8457) 2018-06-14 12:47:24 -04:00
fa277e6785 [IDEEP] [fix bug] Fix bug in ideep SkipOutputCopy strategy (#8372)
* fix a bug for SkipIndices

* IDEEP bug, revise the output to CPUTensor in SkipOutputCopy strategy

* [IDEEP] Add IDEEP fallbacks for Style-Transfer ops
2018-06-14 09:42:00 -07:00
a4bd4f6c6f Fix -g not passed to nvcc when DEBUG=1 (#8407)
* Fix -g not passed to nvcc when DEBUG=1

* blacklist -Werror

* filter CMAKE_CXX_FLAGS too

* restore to space-delimited string before ending macro
2018-06-14 12:36:50 -04:00
384936f73e TypeId improvements (#8350)
* Improve TypeId:
- move it to c10 namespace to allow for easy extraction from caffe2 into c10 (i.e. reuseability from aten)
- Use unordered_map/unordered_set instead of map/set for performance
- Make TypeId a type safe class (i.e. no implicit casts from/to int)
- Make TypeId constexpr
- Some readability improvements (e.g. using instead of typedef)
- Don't explicitly implement TypeMeta copy assignment and construction - let the compiler do that for us.
- Add TypeMeta move constructor
- Make TypeMeta members noexcept
- Implement TypeMeta::operator== and operator!= as free functions instead of in-class

* CR comments

* fix

* fix windows

* Rename back to CaffeTypeId

* Remove c10::TypeId/TypeMeta

* remove C10_KNOWN_TYPE

* code review
2018-06-14 09:16:26 -07:00
752bb954b4 Update RunAsyncFailure test (#8486)
Fix RunAsyncFailure test
2018-06-14 12:05:57 -04:00
21609e0fd0 `bincount` feature implementation (#6688)
* Implement CPU bincount feature support

* Incorporate feedback on renaming to SummaryOps file and other nits

* bincount gpu implementation

* refactor cuda code and incorporate nits

* doc fix

* cuda bincount - cast weights to double if integral type

* fix: signed unsigned comparison error

* fix: ssize_t error

* refactor

* make template typenames readable and other nist

* make compatible with v0.5

* incorporate comments

* update test cases to ensure CUDA code coverage
2018-06-14 11:38:04 -04:00
2a0e98a334 Move libtorch CMakeLists.txt to torch/ (#8444) 2018-06-14 11:36:49 -04:00
e323f02277 Fixing missing PyCObject_Type bug (#8467) 2018-06-14 08:08:25 -07:00
2184e3f933 Use MKL VML if available (#8458) 2018-06-14 10:40:21 -04:00
8d674c0d51 add comparison operators to jit (#8058)
* add comparison operators to jit

* try to fix CI

* address review comments

* fix type of comparison ops result

* address review comments

* fix indentation

* add comments

* require type_as to have non-dynamic tensor arg

* Typo (should check if template argument of type_as, inputs()[1], is tensor)

* Use .at() instead of []

* Use .at() again
2018-06-14 09:30:25 -04:00
9d88ff7d0d Add half cauchy, half normal distributions (#8411) 2018-06-14 10:28:42 +02:00
6a85b133d3 Improve number formatting in tensor print (#7632)
* Improve number formatting in tensor print

* fix bad rebase

* address comments

* fix test

* fix test

* use assertExpected for tests

* address comments

* address comments
2018-06-13 23:57:16 -07:00
bb9ef8fc2e Support new version of Dropout (#8470) 2018-06-14 14:47:47 +08:00
2de4ab88f5 remove _assert_no_grad from loss modules (#8460) 2018-06-13 21:30:51 -04:00
db14f3f33c More efficient kernels that avoid deprecated shuffles in Embedding and LookupTable (#8400)
* More efficient kernel that avoids deprecated shuffles in Embedding.cu and THCUNN/LookupTable.cu

* Using WARP_BALLOT from THCDeviceUtils.cuh, also changing WARP_BALLOT to return unsigned
2018-06-13 21:29:51 -04:00
f7585178cd [auto] Update onnx to b7d5a60 - Add stats on ONNX node tests (onnx/onnx#1110)
b7d5a60f90
2018-06-14 01:18:58 +00:00
64d5b1454e Add is_variable tag to Tensor (#8414)
* Add is_variable tag to Tensor

* Add is_variable tag to Type
2018-06-13 18:14:29 -07:00
6e314f9f68 update tensor clone docs (#8462) 2018-06-13 21:06:21 -04:00
681964cc47 output each operator separately due to logcat truncation (#8456)
as title
2018-06-13 21:05:05 -04:00
ad378dfbaf Adding necessary LOCAL variables in order for the perl script that HIP utils uses to run successfully without error. (#8464) 2018-06-13 20:28:54 -04:00
df3559ca58 Move hip utils files to a separate directory (#8446) 2018-06-13 16:49:59 -07:00
dc209ed963 [c10d] Rendezvous skeleton (#8294)
* [c10d] Rendezvous skeleton

The rendezvous function takes an URL and produces a triplet of a store,
a process rank, and the process group size.

For the file and TCP handlers, the rank and size must be specified, but
other handlers may discover these parameters dynamically.

It returns a generator function, such that if a rendezvous handler
supports rerendezvous, you can write:

for store, rank, size in c10d.rendezvous(...):
  pg = c10d.ProcessGroup(store, rank, size)
  while the process group is valid:
    # Do stuff with process group

* Add Python 2 fallback for urlparse library

* Import X as Y

* Relative import seems to fix it

* Spelling

* Gate import on c10d availability
2018-06-13 15:27:32 -07:00
8a837f0fe3 Repairing the integrated build path to handle the Caffe2 PR. (#8441)
* Modifying the build path to handle Caffe2's merge

* Update LoadHIP.cmake

Fixing typo.

* Update Dependencies.cmake

Keeping hip_include_directories since other Caffe2 libs depend on it.

* Update CMakeLists.txt

Only including for the second time if we're building with ATen.

* Update CMakeLists.txt

Adding comments to make sure future users understand why necessary commands have been added.
2018-06-13 17:16:59 -04:00
4d287f9074 Use int64_t instead of int for in loop that may overflow. (#8435) 2018-06-13 17:02:32 -04:00
2c9c48a323 Add CODEOWNERS entry for c10d test file (#8445) 2018-06-13 16:22:57 -04:00
71a3633e3f change tensor.set_() argument names to match descriptions in doc (#8403)
Replaced args name `storage` and `sourceStorage` to `source` in tensor.set_() to match the descriptions in docs.
2018-06-13 13:22:50 -07:00
5b86c3af4a Update from facebook (#8384)
* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* Remove the code per soumith's comments

* Remove the code per soumith's comments

* Remove blank lines in the end of file

* Resolve conflicts for torch/_thnn/utils.py

* Update MKL exporter to IDEEP ops

TSIA

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)

* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove the code per soumith's comments

* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)

* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface

* Remove imaginary file (#8415)

* [Caffe2] Enable AMD/MIOPEN ops for Caffe2  (#8306)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format

* Enable some reduce operators' ONNX backend tests (#8418)

* fix old comment to point to the right file (#8416)

* Stop pinning nccl version. (#8421)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)

* Enable some of the ONNX backend test on broadcasting (#8423)

* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast

* Expose proto utils and ONNX (#8073)

* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files

* Rebase creates some weird situations, revert them manually

* Remove more weird changes due to rebase

* Need to add thread_name.cc after merge
2018-06-13 13:10:45 -07:00
f1b5124306 Fix #8420, defaulting the initial hidden state to 0 (#8427) 2018-06-13 14:26:28 -04:00
09896d1e77 Allow nccl downgrades (#8429)
* Revert "Stop pinning nccl version. (#8421)"

This reverts commit 3cb45bafc8b9b023049e5f979a2bcb75e3f7009d.

* Allow downgrades from libnccl2 install.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-13 13:56:34 -04:00
edd4e2c5d1 Expose proto utils and ONNX (#8073)
* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files
2018-06-13 10:25:32 -07:00
7543d0f794 Enable some of the ONNX backend test on broadcasting (#8423)
* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast
2018-06-13 10:15:56 -07:00
61f61de270 Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428) 2018-06-13 12:27:58 -04:00
3cb45bafc8 Stop pinning nccl version. (#8421)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-13 10:53:56 -04:00
7ca8e2f131 fix old comment to point to the right file (#8416) 2018-06-13 21:33:05 +08:00
a42c12bb11 Enable some reduce operators' ONNX backend tests (#8418) 2018-06-13 21:32:50 +08:00
c37e5b7137 [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306)
* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format
2018-06-13 04:00:39 -07:00
36bf89bf09 Remove imaginary file (#8415) 2018-06-12 23:17:19 -07:00
04503962ff [ONNX] Add an ATen fallback pathway for ONNX export (#8273)
* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface
2018-06-12 22:59:45 -07:00
76f22b7aef [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)
* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-06-12 22:06:16 -07:00
81b92f7515 Get ROCm building again on master (#8343)
Billing of changes:

- New Jenkins script for building on rocm. For now it is a bit hacked together, but we can improve it once CI is running
- New ROCM docker image for nightly HIP, and also some legacy packages that we need temporarily
- New enabled config py2-clang3.8-rocmnightly-ubuntu16.04-build based off of the existing Caffe2 image (not built yet)
- A big pile of cmake fixes, mostly to turn bits on/off when ROCM build is involved
- Switch from hiprng to hcrng
- Apply some patches directly in code, eliminating the patches
- Use __hdiv instead of hdiv, it's more portable
- THCNumerics<T>::gt doesn't work in HIP, so simulate it with sub
- Add a few more overloads HIP needs
- Turn off use of hcc to link (we plan to turn this back on to get tests running)
- Search for hiprand, hiprng, hipblas, hipsparse
- Better Python 2 portability
2018-06-12 23:05:21 -04:00
49d6c5f99f Branch parallel if number of threads is 1 (#8401) 2018-06-12 22:28:51 -04:00
7c9e936986 Add way of deprecating ATen functions (#8404) 2018-06-12 19:26:43 -07:00
557511102e Always include Modules_CUDA_fix for Caffe2 builds (#8396) 2018-06-12 22:19:23 -04:00
4485ce66c2 Fix flaky RoiAlignTest, fixes #8084. (#8312)
* Fix flaky RoiAlignTest, fixes #8084.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

* Increase tolerance

* more...
2018-06-12 20:06:24 -04:00
b947ac227d Check if you forgot to specify 'variants: function' on _out (#8402)
The Python binding generation code doesn't understand
method '_out' bindings correctly, and will compute the
indices wrong if you have an '_out' function that's also
method.  This is a quick check to prevent you from making
this mistake.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-12 20:05:45 -04:00
fcd9af8a25 changes to support ATen code generation inside fbcode (#8397)
* Back out "Back out "Add support for generating ATen files during fbcode build""

Original commit changeset: 7b8de22d1613

I'm re-sending this diff exactly as it was approved and
committed. Fixes to support @mode/opt will be sent separately for ease
of review.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.
2018-06-12 14:57:29 -07:00
ffffee6aa9 Skip test_multinomial_invalid_probs on Windows (#8360) 2018-06-12 17:00:49 -04:00
712a3fad27 Adding CMAKE_PREFIX_PATH and CMAKE_INSTALL_PREFIX to cmake summary (#8398) 2018-06-12 14:21:11 -06:00
c3e4b3c88b raise more informative error msg for torch.load not support seek (#7754)
Raising more informative error msg for torch.load() when input file does not support seek() or tell()
2018-06-12 12:57:28 -07:00
c6db1bc952 Add gt lt ge le to the supported operators list (#8375)
Add gt lt ge le to the supported operators list
2018-06-12 15:28:34 -04:00
bef12551ee Check CAFFE2_USE_MSVC_STATIC_RUNTIME to set -MD vs -MT in cuda.cmake (#8381) 2018-06-12 11:59:39 -07:00
5f5ea75283 Use SYSTEM For all includes in Dependencies.cmake (#8380) 2018-06-12 11:59:02 -07:00
49eec35e5b More warning skips (#8382)
* Remove check for unused private fields

* Suppress inconsistent-missing-override

* Hopefully last warning skip for Mac

* Add one more warning ignore
2018-06-12 14:44:36 -04:00
a77b391de7 [SpectralNorm] don't register original weight as buffer (#8170)
* don't register original weight as buffer; fixes for buffers that require grad

* add test
2018-06-12 14:42:05 -04:00
922adf8d09 Skip calling ncclCommDestroy in destructor (#8352)
There is a bug in NCCL that causing seg faults while calling ncclCommDestroy() in the destructor during program exit. According to Nvidia, "Whether the NCCL destructor will be called before or after the CUDA runtime destructor is undefined, which can lead to crashes."

For the immediate workaround, skip calling ncclCommDestroy ihe NCCL destructor. This is UGLY and we'll follow up with Nvidia to solve this ASAP.
2018-06-12 13:11:09 -04:00
991bdd7f13 [build] remove the use of NO_CUDA (#8300)
* Only remove NO_CUDA from CMakeLists.txt

* @ezyang's catch
2018-06-12 12:14:36 -04:00
5484a197d9 [c10d] Convenience wrappers for collective functions (#8292)
* [c10d] Add convenience wrappers

* Release GIL
2018-06-12 09:05:16 -07:00
cc8fbc9d08 Revert "Name the thread pools (#8137)" (#8379)
This reverts commit 96876d9e7ef6baf9d11541454b5f4d22b092de77.
2018-06-12 11:51:32 -04:00
96876d9e7e Name the thread pools (#8137)
Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.
2018-06-11 23:13:46 -07:00
a161639fcd Move copyright lines back to NOTICE file, fixes #6911 (#8310)
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-06-11 23:12:41 -07:00
44973a06ba Add affine_channel_op (#8356)
Add affine_channel_op
2018-06-11 20:51:11 -07:00
87dcdf5fe5 [auto] Update onnx to 86999f9 - Fix the LRN's doc (onnx/onnx#1107)
86999f90f0
2018-06-12 02:52:51 +00:00
1f02ebd323 Use clang 8 to build CUDA in macOS CI (#8355)
* Don't use -faligned-new flag for clang < 9.0

* Select Xcode 8.2 toolchain when building CUDA

* Better comment
2018-06-11 22:45:40 -04:00
78e3259bbe Add autograd automatic anomaly detection (#7677)
* add autograd automatic anomaly detection

* python 3 string support

* Fix non python build

* fix typo in doc

* better test and naming fix

* fix no python build and python object handling

* fix missing checks

* clean NO_PYTHON build

* Remove unwanted changes
2018-06-11 21:26:17 -04:00
38362fa9f3 Prepare for moving 0-sized dimensions in TH/THC. (#8337)
This does the following:
1) makes nDimension an int64_t (to match ATen)
2) changes the dimension value to dim_ (so we catch direct usages)
3) provide an _dim() that provides access to the "old" view (so we can migrate functions one at a time)
4) have code call ->-_dim() instead of ->nDimension.
2018-06-11 21:18:02 -04:00
0cced57cb8 Build DEBUG mode with -O0, fixes #8335. (#8336)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-11 21:05:12 -04:00
ae1ceef36a Allow TypeMeta hold non-default-constructible types (#8349)
Necessary for Tensor detemplatization (D8121878) - now tensor won't have default constructor (as we don't know the device).

Thus this diff makes TypeMeta be constructible with non-default-constructible types in which case ctor() is non-null but always throws.

It's dangerous however as we won't catch potential type errors at compile time. Luckily - the only place where ctor() is used is in Blob and Tensor which have templated wrappers there (GetMutable and mutable_data respectively). We can just enforce the necessary type requirements there explicitly as a static_assert.

It also changes the failure behavior to be throw() instead of abort(). Aborting the process is not cool for the library :)
2018-06-11 15:53:07 -07:00
ddab886105 [caffe2] Move elementwise grad ops to separate files (#8315)
* Move elementwise grad ops to separate files

Move elementwise grad ops to separate files

* Fix proto build

* Fix build

* Fix sync error
2018-06-11 15:38:36 -07:00
46c0b01234 Revert D3314316 (#8346)
This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.
2018-06-11 14:23:10 -07:00
9b1480a28e Fix disabling of USE_CUDNN when not found (#8340) 2018-06-11 11:40:51 -07:00
607b86f603 Implement dim_arange operator (#8266)
* Implement arange_like operator

* add ONNX symbolic

* lint

* change name

* Comment the hack
2018-06-11 10:49:29 -07:00
de4e97e89a [C++ API] Cursors (#8190)
* Add cursors to C++ API

* Small self nits

* s/struct/class

* Use more STL like names for cursors
2018-06-11 09:48:43 -07:00
77660a9cbb Support printing sparse tensors in ATen, fixes #8333. (#8334)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-11 12:15:50 -04:00
77dea37dac Skip test_multinomial_invalid_probs_cuda on Windows (#8324) 2018-06-11 11:14:10 -04:00
f4b79f99d1 Fix the script doesn't stop eariler on error for MSVC and Ninja (#8277)
* Simplify the solution

* Remove the usage of set errorlevel
2018-06-11 11:03:30 -04:00
bed172cf54 Fix collect_env.py for Windows (#8326)
* Fix collect_env.py for Windows

* Fix expect file for Win machine
2018-06-11 10:52:21 -04:00
52e4d3c4a2 add error when backend is not supported by DDP (#8325) 2018-06-11 02:18:30 -04:00
94888106a9 Add docstring for torch.sparse_coo_tensor (#8152)
* add sparse_coo_tensor docstring

* update empty tensor example

* whitespace

* whitespace again
2018-06-11 00:03:51 -04:00
80b6f9edd6 [THD] fix broken THD build with NCCL (#8323) 2018-06-10 23:48:10 -04:00
01f5ba4f3e [auto] Update onnx to 4b4085c - Add missing warning ignoring flags to onnx_proto CMake target (onnx/onnx#1105)
4b4085c2e9
2018-06-10 20:49:46 +00:00
0169ac5936 Fix sample code for cuda stream (#8319) 2018-06-10 11:41:50 -04:00
bf8689d0e5 [auto] Update onnx to 5ed684e - Remove/replace /MX with /WX for MSVC build. Was typo in a previous ch… (onnx/onnx#1104)
5ed684ebe5
2018-06-10 04:59:13 +00:00
d33cc08a97 Small fixes (#8296) 2018-06-09 23:11:35 -04:00
5fe24968ed Remove unused grad ops on mobile to reduce app size (#8297)
Remove unused grad ops on mobile to reduce app size
2018-06-09 23:10:05 -04:00
07d3f14eed Clean up old sccache log before build (#8305) 2018-06-09 23:07:47 -04:00
b78466a37d Replace Variables to Tensors (#8309) 2018-06-09 23:07:15 -04:00
29849e428c Removes unused THCTensorConv (#8229) 2018-06-09 17:15:26 -04:00
3521cd54af Fix dividing by zero segfault in Reshape (#8302)
when infer a dimension of zero size new shape
2018-06-09 09:48:22 -07:00
2ed03898cd Add depthwise convolution test for IDEEP (#8301) 2018-06-09 08:44:13 -07:00
e6ef18d531 Entries for torch.distributed in CODEOWNERS (#8293) 2018-06-09 00:28:40 -04:00
788f05d215 Remove THC's FindMAGMA (#8299) 2018-06-08 21:03:39 -07:00
a34211bd79 Some utils for compile-time programming (#7778)
* Add some C++17 features, implemented with C++14

* Add some type traits

* Compile-time type list abstraction

* Some utils for compile-time programming

* Fix compatibility with a larger range of compilers

* Use guts::array instead of std::array because of std::array shortcomings

* code review comments

* Use quotes for includes
2018-06-08 17:10:53 -07:00
f35d7cce91 [auto] Update onnx to 58efe0a - add float16 support back for math and reduction ops (onnx/onnx#1102)
58efe0a9ca
2018-06-08 23:10:17 +00:00
045e7435c3 Have a single THTensor / THCTensor type. (#8288)
* Remove remaining TensorTypeUtils functions.

Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType.

* Have a single THTensor / THCTensor type.

As was previously done with Storages, have only a single (dtype-independent) THTensor / THCTensor.

For documentation and backwards compatibility purposes, the old names, e.g. TH(Cuda)LongTensor alias the new TH(C)Tensor type.

* undef GENERATE_SPARSE.
2018-06-08 17:57:44 -04:00
37073f8be0 [build] Remove /torch/lib/THD/cmake in favor of /cmake (#7159)
* Remove /torch/lib/THD/cmake in favor of /cmake

* path fix

* Explicitly marking gloo to use cuda

* Fix gloo path in THD
2018-06-08 17:55:12 -04:00
c486b8749d Add option USE_NVRTC which defaults to off (#8289) 2018-06-08 14:27:23 -07:00
695d40efc2 Create initial Python bindings for c10d (#8119)
* Build and install c10d from tools/build_pytorch_libs.sh

* Create initial Python bindings for c10d

* clang-format

* Switch link order to include more symbols

* Add bindings and tests for ProcessGroupGloo

* Add broadcast test

* Separate build flag for c10d

* Explicit PIC property

* Skip c10d tests if not available

* Remove c10d from Windows blacklist

Let it skip by itself because it won't be available anyway.

* Make lint happy

* Comments

* Move c10d module into torch.distributed

* Close tempfile such that it is deleted
2018-06-08 12:59:51 -07:00
75563674c4 Remove remaining TensorTypeUtils functions. (#8286)
Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType.
2018-06-08 15:51:25 -04:00
efba555a38 c10 build setup (#8264)
* Move c10/ to caffe2/dispatch/

* Set up caffe2/utils directory
2018-06-08 12:11:17 -07:00
d56b4f2568 Set up CI build for CUDA 9.2 + macOS (#8274)
* Add macOS CUDA build to CI

* Fix undefined symbols issue

* Use sccache for CUDA build

* Fix sccache issues

* clean up
2018-06-08 14:12:52 -04:00
a994b432ee [c10d] NCCL Process Group implementation (#8182)
* [c10d] Process Group NCCL implementation

* Addressed comments

* Added one missing return and clang format again

* Use cmake/Modules for everything and fix gloo build

* Fixed compiler warnings

* Deleted duplicated FindNCCL
2018-06-08 10:33:27 -07:00
d301d9df7a [ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233)
IDEEP supports fusion for non-group conv
2018-06-08 10:29:15 -07:00
742912512c Move signal window functions to ATen; add Blackman window (#8130)
* Move signal window functions to ATen; add Blackman window

* fix cuda test not checking scipy
2018-06-08 11:37:46 -04:00
20c516ac18 [cmake] Make cudnn optional (#8265)
* Make cudnn optional

* Remove cudnn file from cpu file
2018-06-08 02:04:27 -07:00
147fc6b9cc [auto] Update onnx to 39e4668 - fix optimizer does not set ir_version bug (onnx/onnx#1098)
39e46687ea
2018-06-08 06:12:08 +00:00
2928a33f50 [auto] Update onnx to 2508156 - Make error message more verbose (onnx/onnx#1097)
2508156135
2018-06-08 05:11:15 +00:00
1a03ba51dc [cmake] Add and export Modules_CUDA_fix (#8271)
* Add and export Modules_CUDA_fix

* actually, need to include before finding cuda
2018-06-07 21:50:30 -07:00
49593a609a [caffe2] Fix ATen dispatch for ops with TensorList arg (#8226) 2018-06-07 20:35:22 -07:00
80fade8af4 un-genericize THCDeviceTensorUtils. (#8258)
* provide data<T>() in TH(C)Tensor.

* un-genericize THCDeviceTensorUtils.

This is used outside of generic context, so we need to un-genericize it to have a single THCTensor type.
2018-06-07 23:29:41 -04:00
4f1440e828 [ideep] Add IDEEP fallbacks for Faster-RCNN ops (#8260)
TSIA
2018-06-07 20:21:56 -07:00
048b2f3a91 [caffe2] Move submodule onnx-tensorrt forward (#7659)
Commit 82106f833dcb0070446a150e658e60ca9428f89b is essential.
2018-06-07 20:07:04 -07:00
8d0c3c721a Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. (#8247)
* Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor.

* Fix template parameter.
2018-06-07 20:54:49 -04:00
0c9b5f0825 Change the output format of caffe2 observers (#8261)
as title
2018-06-07 17:30:43 -07:00
4c2a1a1a64 Added backward function for kl_div target (#7839)
* added backward fn for target

* added module test for kl_div target, and assuming targets are probabilities
2018-06-07 17:17:18 -07:00
ce122cc2d3 Relax CUDA_HOME detection logic, to build when libraries are found. (#8244)
Log when no cuda runtime is found, but CUDA is found
2018-06-07 20:08:13 -04:00
73966f65ae Stop BCELoss from returning negative results (#8147)
* Stop BCELoss from returning negative results

* check explicitly for 0 before taking log

* add tests

* fix lint

* address comments
2018-06-07 20:06:04 -04:00
e2be77eae8 Fix app size check (#8256)
Fix app size check
2018-06-07 15:34:22 -07:00
78b88219fa [cmake] Use CAFFE2_USE_* for public/cuda.cmake (#8248) 2018-06-07 15:00:38 -07:00
b4c6310247 Fully genericize THC/THCUNN (except for TensorUtils and DeviceTensorUtils). (#8251) 2018-06-07 17:47:45 -04:00
95ae09c866 [auto] Update onnx to 3a035f4 - Add retry logic to model downloading (onnx/onnx#1077)
3a035f4397
2018-06-07 20:33:02 +00:00
93a9bb9f35 Don't override Tensor, Storage macros defined outside torch/csrc in t… (#8243)
* Don't override Tensor, Storage macros defined outside torch/csrc in torch/csrc.

This PR does the following:
1) Removes THSTensor macros in torch/csrc, which aren't used.
2) For macros defined outside of torch/csrc (THTensor, THTensor_, THStorage, THStorage_):
a) No longer override them, i.e. previously THTensor could actually be THCTensor if a generic file was included from a file including THCP.h.
b) Instead, introduce new macros THW* (e.g. THWTensor) to represent a (potentially empty) wildcard character.

In addition to making this code easier to read and codemod, this allows us to more freely change TH/THC; for example:
currently in the THC random code, the state is casted to THByteTensor*; this happens to work because the macros don't happen to override THByteTensor.
But if THByteTensor just becomes an alias of THTensor (which is the plan for a single tensor type), then this no longer works.
The whole thing is a bit of a mess previously because you really have to understand which macros and redefined and which aren't.

We could also rename the macros that live in torch/csrc (e.g. the THPTensor macros), but since that is more self contained, I punted for now.

* Don't change the plugin.
2018-06-07 16:10:10 -04:00
a466c12bd4 Fix lifting cat into its constant version (#8174)
This fixes a bug where schema including varargs lists did not lift
properly blocking correct ONNX export.
2018-06-07 12:38:58 -07:00
f2c86532f3 Fix TEST_CUDA import in test_cuda (#8246) 2018-06-07 15:12:05 -04:00
14f5484e0d Print requires_grad and grad_fn in string repr of tensor (#8211)
For example:

  >>> torch.ones(3).requires_grad_()
  tensor([ 1.,  1.,  1.], requires_grad=True)

  >>> torch.ones(3).requires_grad_() * 5
  tensor([ 5.,  5.,  5.], grad_fn=<MulBackward0>)

The suffix (dtype, requires_grad, grad_fn) wraps to a new line if
it would cause the the line to exceed the linewidth.

  >>> torch.ones(10).double().requires_grad_()
  tensor([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
         dtype=torch.float64, requires_grad=True)
2018-06-07 14:31:23 -04:00
d2271dcee3 Fix: gradcheck forced float32 (#8230) 2018-06-07 12:31:18 -04:00
3eb9ba4d60 Remove .gitmodules.aten since it is in .gitmodules now (#8232) 2018-06-07 09:12:37 -07:00
d1bdb3b10a Remove core and util warnings (#8239)
* Fix some signed/unsigned mismatches

* Skip unused result warning

* Explict fallthrough for murmur hash

* Enable aligned new support to eliminate warning

* Switch to int instead of unsigned in some cases
2018-06-07 09:10:33 -07:00
ea5d871e49 [caffe2] Build Android tests and binaries in CI (#7593)
Update benchmark submodule to version with fixed Android/GNUSTL build
2018-06-07 09:07:38 -07:00
7ed361a466 Rename SparseTensor to SparseTensorRef. (#8237)
I want to introduce using SparseTensor = Tensor (as a documentary
type alias for Tensor), but the name is already taken.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-07 11:03:49 -04:00
346568d40f Use .cc since some downstream libraries are configured for C++ only. (#8234) 2018-06-07 01:37:52 -07:00
c22c55ebed [auto] Update onnx to 62e63e9 - Fix build errors inside protobuf-bench (onnx/onnx#1084)
62e63e9de8
2018-06-07 05:42:14 +00:00
832c88a766 [ideep] Add IDEEP Squeeze op (#8227)
Similar to MKLSqueezeOp at caffe2/mkl/operators/squeeze_op.cc
2018-06-06 21:58:51 -07:00
4df86b6547 Update MKL exporter to IDEEP ops (#8228)
IDEEP exporter support
2018-06-06 21:43:43 -07:00
b401e6b03a Allow optional build and installation of native test binaries (#8225)
* test finetuning

* install off by default

* Turn BUILD_TEST=ON for jenkins.

* Turn on install_test in jenkins as well
2018-06-06 20:56:31 -07:00
8af88f3525 [Caffe2] Add ADD operator for IDEEP (#8220)
* Add ADD operator for IDEEP

* Add boradcast check

* Comments
2018-06-06 20:20:33 -07:00
Ben
2f18f864fb Fix win mkldnn (#7718)
* Sync build_pytorch_libs.bat with build_pytorch_libs.sh

* fix quoting

* add warnings

* fix warnings

* Add /EHa
2018-06-06 22:59:38 -04:00
d0ca8896d5 Don't copy unneeded grads when using a function for several derivatives (Fixes #7722) (#7759)
Trying to copy all results fails when one of them is a tensor list which
has not been populated. This blew up for CuDNN RNNs when the weights
did not require grad.

Thanks to Sylvain Gugger for reporting!
2018-06-06 22:54:23 -04:00
c84b97b979 [READY TO MERGE] Enable tests that use DataLoader with multiple workers on Windows (#6745)
* Don't import TEST_CUDA for test_dataloader on Windows

* test_partial_workers is stuck on Windows
2018-06-06 22:50:39 -04:00
89ea6acde2 [NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647)
* Add nan and inf probs check to multinomial

* fix bug

* Spawn CUDA test in subprocess

* Make sure invalid input won't pass the test case

* Try to fix error

* Test failure cases in Python 3 only

* Try to fix Windows error

* Move CUDA test to test_cuda.py

* fix issues

* fix module name error

* no need to check for CUDA existence in test_cuda

* Use PY3
2018-06-06 22:49:12 -04:00
784c46ba1d [READY TO MERGE] Use ccache in macOS build (#8009)
* Use ccache in macOS build

* Moving to sccache

* Don't use sccache in test job
2018-06-06 22:38:10 -04:00
1172b152ab move THCP-related utils to cuda/utils.cpp. (#8221)
These files don't follow the usual pattern: In general the files torch/csrc/X torch/csrc/cuda/X
both include the generic file torch/csrc/generic/X, where torch/csrc/X includes the cpu implementations and torch/csrc/cuda/X includes the cuda implementations.
(Aside: this is probably not the best structure, the torch/csrc/X fiels should probably be moved to torch/csrc/cpu/X).

utils.cpp combines these so that torch/csrc/utils.cpp has cuda specific code.  This makes it impossible to declare a single THTensor and THCTensor template type (i.e. THPPointer<_THTensor>, THPointer<_THCTensor>).
2018-06-06 20:58:57 -04:00
5ec3041a42 Structure THTensor like THCTensor is structured. (#8217)
In particular, define a base type, _THTensor, that can be used for all THRealTensor structs.
This is just to have less cognitive load when dealing with generic THTensor/THCTensor types (as in templates).
2018-06-06 20:58:04 -04:00
deb56dfd06 Change new bernoulli implementation to be fully generic. (#8218)
The current implementation depends on THTensor types being unique, which is not guaranteed going forward.
2018-06-06 20:54:38 -04:00
07df98a3b8 [auto] Update onnx to e96d823 - Update Google benchmark to 1.4.1 (onnx/onnx#1083)
e96d823e5c
2018-06-07 00:49:04 +00:00
02734e389d Move helper functions to unnamed namespace. (#8224)
Currently, the helper functions in this file are in global
namespace. I am guessing the purpose of excluding them from was to
keep them local.
2018-06-06 17:16:34 -07:00
7cace7219a Change the benchmark log format and also log flops (#8215)
as title
2018-06-06 17:04:54 -07:00
b03ba9023e Set up a c10 source folder (#7822)
* Set up a c10 source folder
2018-06-06 16:56:17 -07:00
f3869b4e03 [auto] Update onnx to 18d70ff - Graph should only have one (input) kParam node (onnx/onnx#1088)
18d70ff529
2018-06-06 23:40:38 +00:00
12229afd00 Record shape and type in autograd to validate gradients (#8168)
The check that the gradient is defined is currently disabled because
TestJit.test_ge_optimized will trigger the error.
2018-06-06 18:09:53 -04:00
36b8cc5483 skip CUDA memory leak check on Windows altogether (#8213) 2018-06-06 17:29:53 -04:00
56b1dcccf6 [cmake] deprecate caffe2_* specific cuda function in cmake. (#8200)
* deprecate caffe2_* specific cuda function in cmake.

* ENV{} -> $ENV{}

* CUDA_ARCH_NAME -> TORCH_CUDA_ARCH_LIST

* .

* .

* .
2018-06-06 14:13:26 -07:00
f2f76e29ee [auto] Update onnx to f28e2f1 - fix lrn spec (onnx/onnx#1090)
f28e2f1a60
2018-06-06 21:13:09 +00:00
1f23043b0a Fix tanh_op on ios build (#8207)
* Fix tanh_op on ios build

* Fix tanh
2018-06-06 14:09:01 -07:00
7ee517a266 rm -rf aten/contrib (#8165)
* Remove aten/contrib

* Remove from CMake
2018-06-06 16:55:48 -04:00
005eef5027 Bump gloo submodule (#8202)
This includes facebookincubator/gloo#125.
2018-06-06 13:31:29 -07:00
5935c5f23b Fix c10d compiler warnings (#8206)
Copy compiler flags from the ones used in setup.py and fix warnings.
This makes the root build that includes c10d headers warning free.
2018-06-06 13:23:53 -07:00
61fd99e1b3 Replace (non-data) TensorUtils calls with non-generic THCTensor calls. (#8176)
* Replace (non-data) TensorUtils calls with non-generic THCTensor calls.

TensorUtils is templatized on the THTensor type, so to support a single tensor type (like ATen), we need to remove these.

This PR does the following:
1) Allows THCTensorTypeUtils.cuh to include THCTensor.hpp.
   This involves moving includes of it outside of generic/, so we can use the new implementations.
2) Defines a single _THTensor struct and changes THCRealTensor to be a derived type of _THCTensor.
   This allows us to implement a single non-generic function and avoid static_cast or void * tricks to call it from the generic functions.
3) For functions inside of TensorUtils that don't use data pointers:
   a) Implement the functions in (non-generic) THTensor.cpp and declare them in (non-generic) THTensor.hpp.
   b) Have the generic versions call the non-generic versions.
   c) Replace the corresponding TensorUtils<THCTensor>::fn call with (non-generic) THTensor_fn.

* Add comment about THCTensor struct.

* Error if storage is null in setStorageNd or resizeNd.
2018-06-06 16:19:40 -04:00
4d025a6a54 add wipe_cache option (#8204)
as title
2018-06-06 13:08:39 -07:00
eaea0f4b82 Update c10d build to link against Caffe2 (#8201)
This follows #7399.
2018-06-06 11:40:07 -07:00
edfcbfbe1f Implement randperm for CUDA (#7606)
* Implement randperm for CUDA

* Use Thrust to implement randperm

* clean up

* Fix test

* Offload small input scenario to CPU

* Fixed test

* Try to fix Windows error

* Fix Windows error and clean up

* Use fork_rng context manager

* Move test_randperm_cuda to test_cuda

* Add half tensor support

* Fix cuda::type error

* Fix CPU offloading

* Fix issues

* No need to check range for n == 0 case
2018-06-06 14:30:58 -04:00
9af3a80cff Docs for gradcheck and gradgradcheck; expose gradgradcheck (#8166)
* Docs for gradcheck and gradgradcheck; expose gradgradcheck

* address comments
2018-06-06 13:59:55 -04:00
35f08b930d Allow parallel_apply to take in list[Tensor] (#8047) 2018-06-06 13:49:52 -04:00
e6044e5576 use THCThrustAllocator in BCECriterion (#8188) 2018-06-06 13:19:16 -04:00
c0b2a2aa3b Add more annotations for arguments in ATen schema (#8192) 2018-06-06 13:11:39 -04:00
5e372c7106 fix lint 2018-06-06 12:53:58 -04:00
115a494b5f Fix scalar check for sparse tensors. (#8197)
* Fix scalar check for sparse tensors.

As discovered in #8152

If `t` is a scalar sparse tensor, `t._indices` used to return a sparse
empty tensor because the scalar check was incorrect. This PR modifies
the scalar check to return a dense tensor instead of a sparse tensor.

i.e.
```
tensor = torch.sparse_coo_tensor([], [], torch.Size([]), device=device)
out = tensor._indices()  # was a sparse tensor, now is dense.
```

* Fix typos
2018-06-06 12:24:25 -04:00
8e6f7a1382 [Caffe2] Merging setup.py with setup_caffe2.py (#8129)
* Mergine setup.pys, torch works, caffe2 works up to other KP

* Fix to super call for python 2

* Works on python2 on mac

* Consolidating Caffe2 flags
2018-06-06 08:31:31 -07:00
857020b849 [auto] Update onnx to 4e65fd8 - fuse consecutive squeezes (onnx/onnx#1078)
4e65fd83ba
2018-06-06 13:25:11 +00:00
f45a3d5558 Add a loop unrolling pass to PyTorch JIT (#7672) 2018-06-06 09:36:12 +02:00
Ben
a6305ea210 Fix protobuf options (#8184)
* protobuf

* fix protobuf_MSVC_STATIC_RUNTIME
2018-06-05 22:43:05 -07:00
c496a4a347 Yangqing as an ONNX codeowner (#8185) 2018-06-05 22:06:32 -07:00
3b8f4d1d88 [ONNX] Fix type_as symbolic (#8183)
* [ONNX] Nuke type_as symbolic

* make it better

* Fix lookup + test
2018-06-05 22:06:20 -07:00
bae82f726d fix caffe2 docker build (#7411) 2018-06-05 22:51:43 -04:00
e8d6ac50b4 Add retry logic to sccache download for Windows build (#7697)
* Add retry logic to sccache download for Windows build

* fix script bug

* clean up
2018-06-05 22:38:30 -04:00
c1bd3b3fb7 Better conv error message basing on weight shape (#8051) 2018-06-05 22:22:00 -04:00
b2dac08049 Fix a corner case for ReShapeOp (#8178)
In my use case, in the backward propogate pass, the reshape need to
change a [0] tensor into [0,0] shaped tensor. The original implementation would
cause out of index issue. This diff fix this problem.
2018-06-05 19:06:10 -07:00
c21465e32e Get rid of SOVERSION (again). (#8132)
We don't want SOVERSION because pip will lose the symlink and
double your distribution size, and also because our setup.py
accidentally links against both libcaffe2.dylib and libcaffe2.1.dylib
on OS X.  This leads to a very puzzling error where you get
the error "cannot initialize CUDA without ATen_cuda", because
there are actually two copies of your registry in memory (because
there are two copies of the dynamic library).  Dropping SOVERSION
makes it impossible to make this mistake.

In principle, if the shared library load is done with DYLD_GLOBAL,
that should also prevent two copies of the registry from popping up.
Worth checking at some later point, if you need to bring back
SOVERSION (because, e.g., pip finally fixed their software.)

Partially fixes #8022.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-05 22:03:04 -04:00
d7ba404e29 Add back onnx console scripts dropped during migration from onnx-caffe2 (#8143) 2018-06-05 22:02:14 -04:00
ffde23d45e use the correct datatype format (#8144) 2018-06-05 22:01:59 -04:00
e53fec0495 [JIT] Support a single TensorList argument anywhere in the argument list + index_put (#8173)
* [JIT] Support a single TensorList argument anywhere in the argument list

* [JIT] index_put
2018-06-05 21:48:54 -04:00
Ben
ccabdfef42 Export getCudnnHandle (#7726) 2018-06-05 20:51:52 -04:00
9243b64bff [Caffe2] Update elementwise ops to support numpy style boradcast (#8070)
* Update elementwise ops to support numpy style boradcast

Update elementwise ops to support numpy style boradcast

* Fix sqrt_op

* Fix compare ops

* Fix gradient test

* Fix optimizer legacy broadcast

* Fix legacy broadcast for elementwise ops

* Skip flaky test

* Fix eigen simple binary op

* Fix attention test

* Fix rnn test

* Fix LSTM test

* Fix tan grad

* Fix schema check
2018-06-05 15:49:16 -07:00
0517623517 Abstract parallelization to faciliate using threadpools (#8163) 2018-06-05 22:36:17 +00:00
ba46d3d981 Adding -setup- path, and better code structure (#8122) 2018-06-05 14:40:00 -07:00
fa1bdcf4d2 Pinning opencv to < 3.4 in conda builds (#7923)
* Pinning opencv to 3.1.0 in conda builds

* Also pinning numpy to 1.11

* Trying only specifying <3.4
2018-06-05 13:16:02 -07:00
a3fc5ed351 Move non-generic Storage code needed by TensorUtils to non-generic C++. (#8164)
For non-generic function call implementations in Storage used by TensorUtils, we do the following:
1) Move the declaration from generic/C to non-generic/C++; we don't need backwards compatibility on these functions and want to use e.g. at::ScalarType.
2) Move the implementation from generic/C++ to non-generic/C++.
3) Change the generic implementation to call the non-generic implementation.

This will allow us to get rid of the corresponding TensorUtils calls (once we move over the Tensor functions in the same manner).
2018-06-05 14:50:02 -04:00
1cdd7b5c0f Fix __rshift__ bug (#8161)
* Fix __rshift__ bug

* Add small tests for __lshift__ and __rshift__ in test_cuda

* Add a more elaborate check for __lshift__ and __rshift__

* refactor the test to address @zou3519 's comments
2018-06-05 14:30:02 -04:00
990c6c5531 [C++ API] Improve and use OrderedDict for parameters / modules (#7823)
* Improve OrderedDict for C++ API

* Give OrderedDict a subject and fix review comments

* Fix OrderedDict use in torch/csrc/jit/script/init.cpp
2018-06-05 14:29:09 -04:00
bf58bb5e59 Fix cuda.framework error on OSX. (#8136)
When compiling OSX with CUDA, Caffe2's build system uses
find_package(cuda) to get its grubby hands on the CUDA driver
library (for some strange reason, FindCUDA doesn't save this
information as a variable).  Unfortunately, on OSX, sometimes
this picks up the cuda.framework folder, and then our build
system chokes to death because it doesn't try to link against
this as a framework.  (Is the folder even a framework?  I have
no idea).

This commit attempts to fix this in a two pronged fashion:

1. For some users, reducing the precedence of frameworks
using CMAKE_FIND_FRAMEWORK seems to help.  So we set these
variables.  However, this fix is not perfect; on my laptop
it doesn't actually solve the problem.

2. PyTorch doesn't actually need the CUDA driver API.  So we
only add the dep when building Caffe2.

Fixes #8022

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-05 13:37:05 -04:00
7c1e8c3c7a remove some unnecessary cudaGetDevices (#8089)
* remove unnecessary cudaGetDevices

* make curDevice argument non-optional, add explicit checks to current_device
2018-06-05 13:17:47 -04:00
aec6d6a7d3 [auto] Update onnx to 968d28d - fix Node::isBefore (onnx/onnx#1075)
968d28d901
2018-06-05 16:31:01 +00:00
fe805794ac docstring support for @script and @script_method (#7898)
* docstring support for @script and @script_method

* make it python2 compatible

* improve according to review

* improve build_stmts

* use filter instead of list comprehension

* improve the way wrap is handled for script_method

* stash the original method instead

* allow dynamic attr for ScriptMethod and GraphExecutor

* a bit comment on build_Expr

* remove _build_wrap

* a bit improve on comments

* rename to __original_methods

* should be _original_methods
2018-06-05 10:36:08 -04:00
c719c8032c docs: add canonical_url and fix redirect link (#8155)
* docs: enable redirect link to work for each specific page

* docs: add canonical_url for search engines

closes #7222

* docs: update redirect link to canonical_url
2018-06-05 10:29:55 -04:00
227a7640ce Accelerate bernoulli number generation on CPU (#7171)
* opt bernoulli rng with vsl and openmp

* detect cpu vendor for bernnoulli

* retrigger test platform

*  check the vendor more severely

* use cpuinfo to check vendor
2018-06-05 10:23:48 -04:00
ee0b75a3d2 docs: Add warning to torch.repeat() (#8116)
* docs: Add warning to torch.repeat()

closes #7993

* docs: Add links for numpy functions

* docs: Break the too long line
2018-06-05 10:15:36 -04:00
f5cd479b59 fix type mismatch while call torch._C._cuda_setDevice (#8065)
* fix type mismatch while call torch._C._cuda_setDevice

* fix type mismatch in scatter

* fix type mismatch in scatter

* fix type mismatch while call torch._C._cuda_setDevice

* fix type mismatch while call torch._C._cuda_setDevice

* fix type mismatch while call torch._C._cuda_setDevice
2018-06-05 09:53:22 -04:00
c446269568 cpu/ideep context converter (#8139) 2018-06-04 21:28:59 -07:00
f8c18e00d5 Fix a corner case for ReShapeOp (#8142)
In my use case, in the backward propogate pass, the reshape need to
change a [0] tensor into [0,0] shaped tensor. The original implementation would
cause out of index issue. This diff fix this problem.
2018-06-04 20:40:43 -07:00
a5ce0126cc Fix job name checking for AVX tests (#8135) 2018-06-04 19:25:15 -04:00
c0a419e6ba Add non_blocking to Tensor/Module.to (#7312)
* Add non_blocking to Tensor/Module.to

* flake8

* Add argparse tests

* cpp parse

* Use C++ parser

* use a commong parse function with Tensor.to

* fix test_jit

* use THPObjectPtr

* increase refcount for None, True, and False

* address comments

* address comments
2018-06-04 18:46:52 -04:00
ec4a0f332e Add back lrn test (#8134)
* Revert "Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127)"

This reverts commit 410191c4175eaae141306cdb3c3c1c1e8a495225.

* Fix mismatched default values
2018-06-04 15:06:40 -07:00
94e197c262 Add utf-8 header to Python file with Unicode. (#8131)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-04 14:49:32 -07:00
0ea2fa15a3 Replace most remaining usages of TensorUtils<T>::DataType. (#8124)
As in https://github.com/pytorch/pytorch/pull/8056, this doesn't work with a single TensorImpl type.
This replaces the usages of with a templatized parameter and static_asserts that the new and old are equal.

After this we can get rid of the old template parameter, but I want to ensure they are equivalent across all builds first.
2018-06-04 16:48:57 -04:00
410191c417 Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127) 2018-06-04 12:34:15 -07:00
df28f5d06e [Caffe2] Support non peer access in muji and fix bug when reduced_affix is empty (#6896)
* [Caffe2] Support non peer access in muji

* [Caffe2] Add test for 4 gpus and 2 groups

* [Caffe2] Add comments

* Fix bug when reduced_affix is empty

* Fix typo and add comments about cpu and amd gpu
2018-06-05 03:14:43 +08:00
7fc110b521 Split SparseTensorImpl off from TensorImpl. (#7990)
* Split SparseTensorImpl off from TensorImpl.

At the moment they have the same data layout, but with the upcoming refactor
they will not, and we need a place to put all of the sparse tensor specific
fields.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Update SparseTensorImpl.h
2018-06-04 15:02:09 -04:00
f24d715e23 [auto] Update onnx to 2a87616 - Tests for LRN operator (onnx/onnx#903)
2a876162ac
2018-06-04 18:13:14 +00:00
cef8bfb33e Add missing pragma once. (#8118)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-04 13:26:39 -04:00
7ba0dbc2cd [auto] Update onnx to 2d5ce4a - Remove empty model (onnx/onnx#1058)
2d5ce4aeb6
2018-06-04 16:35:08 +00:00
96a77b5aa8 Make libshm also test if rt requires pthread. (#8112)
In some configurations (e.g., our internal build of GCC 5 + GLIBC 2.23),
-lrt is not sufficient to use shm_open; you also need to declare
a dependency on pthread.  This patch adds a surgical extra fix to
detect this situation, in the case that I noticed it failing in the
wild.

Fixes #8110

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-04 12:12:59 -04:00
c2046c1e5e Implement adaptive softmax (#5287)
* Implement adaptive softmax

* fix test for python 2

* add return_logprob flag

* add a test for cross-entropy path

* address review comments

* Fix docs

* pytorch 0.4 fixes

* address review comments

* don't use no_grad when computing log-probs

* add predict method

* add test for predict

* change methods order

* get rid of hardcoded int values

* Add an optional bias term to the head of AdaptiveSoftmax
2018-06-04 12:12:03 -04:00
e749159064 Detect CUDNN related environment variables in cmake (#8082) 2018-06-04 12:10:36 -04:00
e5b997223c [Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955)
* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Resolve merge conflicts

* .

* Update GetAsyncNetHIPThreadPool

* Enable BUILD_CAFFE2 in pytorch build

* Unifiy USE_HIP and USE_ROCM

* always check USE_ROCM

* .

* remove unrelated change

* move all core hip files to separate subdirectory

* .

* .

* recurse glob core directory

* .

* correct include

* .
2018-06-04 09:04:30 -07:00
3d7a064369 Remove out-of-date comment (#8114) 2018-06-04 11:45:33 -04:00
04a3616de0 Replace std::size_t with size_t (#8093) 2018-06-04 11:10:44 -04:00
185f8fbe7c Removing remaining NO_PYTHON ifdefs (#8067)
* Remove NO_PYTHON in tracing

* Remove NO_PYTHON in ir.h

* Remove NO_PYTHON in test_jit.cpp
2018-06-04 10:53:28 -04:00
f8830f9991 use regex in kwarg parser (#8061) 2018-06-04 10:47:55 -04:00
9fc0ba31b9 Do an additional sanity check that nvcc and CUDA include dir agree. (#8094)
If you set CUDA_HOME and CUDA_NVCC_EXECUTABLE together, you may
end up in a situation where the CUDA_VERSION of your includes
mismatches the CUDA version of your nvcc.  See #8092 for a concrete
case where this can occur.  Explicitly detect this situation and
give a good error message in this case!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-04 10:41:47 -04:00
db5bc71562 Fix and ignore some warnings (#8081) 2018-06-04 01:01:59 -07:00
dff115f47a Move backtrace to its own header (#8096)
* Move backtrace to its own header

* Move cxxabi.h into Backtrace.cpp
2018-06-03 21:11:29 -07:00
36c3859d3e [auto] Update onnx to 356208d - add input tensor dimension checks to shape inference (onnx/onnx#1070)
356208d756
2018-06-03 18:15:05 +00:00
74672a31a2 [auto] Update onnx to cc26486 - bump version to 7 for prelu. (onnx/onnx#1063)
cc26486541
2018-06-03 09:15:45 +00:00
9232afeffa Add code for TensorBoard visualization of JIT GraphExecutors (#8050) 2018-06-02 20:55:25 +02:00
5e35fbfaa3 Post process onnx proto (#8064)
* Post processing onnx generated protobuf files to hide global symbols

* .

* .
2018-06-02 10:46:48 -07:00
01f5ee77e3 Skip ConvTraspose ONNX backend tests (#8074) 2018-06-02 09:52:18 -07:00
624ade1eac [auto] Update onnx to bd98abb - Add a hook for doing post-processing on protobuf generated header files (onnx/onnx#1068)
bd98abbba0
2018-06-02 16:04:34 +00:00
1fc96b6471 [auto] Update onnx to eb12f72 - Add conv transpose test cases (onnx/onnx#886)
eb12f72a86
2018-06-02 15:53:55 +00:00
68948306bc Support to run ONNX Upsample operator (mode=nearest) in Caffe2 (#8037)
* Added support to run ONNX Upsample operator (mode=nearest) in Caffe2

* adding error checks to upsample

* adding error checks to upsample

* adding error checks to upsample

* changing to np.isclose

* Revert onnx submodule update

* still fixing
2018-06-02 08:45:44 -07:00
7be457c2a4 Reduce usages of TensorUtils<T>::DataType in THC. (#8056)
TensorUtils<T> is basically ATen-dispatch-lite in that it allows one to do multi-type THC function dispatch with a single call.
However, it is templatized on the Tensor type, and since we are moving to a single Tensor type, this doesn't work.

Most of the functions in TensorUtils (e.g. getDims) can be pulled up a level, to just call THCTensor_nDimension (or directly accessing the member),
but the DataType specific functions are more problematic.

So, this PR does two things:
1) Replaces calls of 'TensorUtils<THCTensor>::DataType' with 'real' since these are identical
2) Templatizes the THC_pointwiseApplyX functions to take scalar types.  To ensure this is done correctly, we static_assert that the scalar type template parameter matches the scalar type of
   the corresponding template parameter.  We will need to get rid of these static_asserts in the future, but this is useful for now.
2018-06-02 11:26:02 -04:00
7926313235 Have a single THStorage and THCStorage type. (#8030)
No longer generate data-type specific Storage types, since all Storage types are now identical anyway.
For (some) backwards compatibility and documentation purposes, the Real names, e.g. THLongStorage are now #defined as aliases to the single THStorage type
2018-06-02 11:05:02 -04:00
3cbaa6b785 [ready] Clean up torch.distributions (#8046) 2018-06-02 16:54:53 +02:00
afa75fa6b2 Remove NO_PYTHON macros from Exceptions.h/cpp (#8007)
Removes cases where NO_PYTHON was unnecessary in Exception.h/cpp
2018-06-01 22:37:18 -07:00
bef306eac7 [auto] Update onnx to 033f956 - make gcc happy (onnx/onnx#1061)
033f956f41
2018-06-02 05:06:33 +00:00
f2573e8df7 [auto] Update onnx to e6a500e - Extract constant to initializer (onnx/onnx#1050)
e6a500e54c
2018-06-02 04:29:28 +00:00
7379b22abe [auto] Update onnx to 4f8ef17 - Remove erroneous documentation around maps and sequences. (onnx/onnx#1069)
4f8ef17ad3
2018-06-02 04:20:54 +00:00
8d4e92a91d [auto] Update onnx to 0dbec2a - - Generate protoc type hints on Windows (onnx/onnx#1047)
0dbec2a047
2018-06-01 23:59:08 +00:00
2fb957da81 workaround for Sequential when one cannot retrieve python source (#8048) 2018-06-01 18:45:11 -04:00
eb2f21f1e4 Skip CUDA memory leak test on BN tests on windows (#8043) 2018-06-01 18:09:14 -04:00
82b981e4db Update from facebook 1ee4edd286a3 (#8040)
* Adding instance weight to batch distill loss

as title

* add bfloat 16-31

added bfloat 16-31 and their respective unit tests

* [CUDA9] Upgrade - fbcode

CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan").

This diff can only be committed if:
1. CUDA 9 rpm is rolled out fleet-wide (TBD)
2. NVidia driver 390.40 is rolled out fleet-wide (done)
3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done)
4. Make sure all dependents are built (done)
5. Test all C2 operators, PyTorch (see test plan)

* Share intermediate int32 buffer across Conv ops

Adding a known type

* [C2 fix] infer function for ensure_cpu_output_op

this is adding the missing device funtion for ensure_cpu_output_op

* [int8] Add blob serializer/deserializer for Int8TensorCPU

To export to logfiledb

* [nomnigraph] Add try catch block to optimization passes in predictor

This will catch failures that happen in the optimization pass.

* Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE

CAFFE_ENFORCE uses strack trace fetcher. Which is currently a
global static variable. If at static initialization time CAFFE_ENFORCE
is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init
functions registration, so we started to see this.

Meyers singleton is going to provide safety here. If stacktrace
fetcher was not registered yet, it will just use a dummy one.

* NUMA support in SparseNN CPU benchmark

Adding support for NUMA in SparseNN CPU benchmark

* [mobile-roofline] Add logging needed for roofline model

This should be all that's needed

* Let the operators using the same input if the operators are not chained

or else, we have to change the input data dims

* fix null-pointer-use UBSAN errors in in reshape_op.h

* revert previous fix on input blob name

as title

* Adding flag to let MineHardNegative automatically extract single value from dict

Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case.

* Reverting change that broke internal tests back to OSS compatible state
2018-06-01 17:41:09 -04:00
9060b7f4e2 Add profiling annotations to NeuralNet[Operator|Data] (#8005) 2018-06-01 14:27:42 -07:00
ef1c15f5ca [script] Add support for torch.zeros, torch.ones, etc. (#7799)
* [script] Add support for torch.zeros, torch.ones, etc.

* modifies gen_jit_dispatch to creating bindings for functions that do
  not take tensor arguments, but do have an initial type argument
* adds tensor attributes to these functions for device, layout, and
  dtype specification
* extends the list of valid compiler constants to include device, layout,
  and dtype.
* allows functions with Generators, but only using the default generator

Known limitations:
* when using `torch.float`, we convert it to a scalar tensor and make
  no checks that it is actually used only in a dtype specification.
  This is similar to how we handle Python numbers, creating some situations
  where the script is more permissive. Fixing this requires much more
  significant changes to the IR, so is lower priority for now.
* devices specified using string literals e.g. 'cuda:1' do not work,
  since we do not support string literals in general.
2018-06-01 14:24:18 -07:00
2ec2e6947e [auto] Update onnx to 9e7855d - Remove PyTorch generated Upsample tests cases (onnx/onnx#1064)
9e7855dcd4
2018-06-01 21:15:47 +00:00
c6a923f486 Support modules that output scalar in Gather (and data parallel) (#7973)
* Support modules that output scalar in Gather (and data parallel)

* Improve warning msg
2018-06-01 16:20:39 -04:00
215abffe60 [auto] Update onnx to 760c928 - add missing hasNInputShapes check for bidirectionalBroadcastShapeInference (onnx/onnx#1060)
760c9283d0
2018-06-01 20:14:57 +00:00
23dd033b51 Factor python dependency out of interpreter (#7970)
* Factor python dependency out of interpreter

* Remove NO_PYTHON for the autograd engine

If there is no python bindings, then a default Engine is constructed
the first time it is requested.

If the python libraries are loaded, then they override the default
accessor and the default engine becomes a python Engine.

Note: it is possible for two engines to be generated if a non-python
one gets created before the python bindings are loaded. This case
is rare, and just results in additional threads being spawned.

* Fixing AlexNet test which is skipped in CI
2018-06-01 16:07:21 -04:00
41ef5c2d4b Support for generating ATen during the fbcode build, rather than committing the generated files (#8002)
Paint the internal bikeshed a slightly different color to appease Buck tooling.
2018-06-01 16:04:02 -04:00
d27e138a1a Allow CI testing with different AVX configs (#8020)
* Allow CI testing with different AVX configs

* Unset ATEN_DISABLE_AVX and ATEN_DISABLE_AVX2 in default config
2018-06-01 12:30:11 -07:00
8f421159fd Fix profiler crash when no events register (#8034)
* Fix profiler crash when no events register

When trying to profile, attempting to print the event table throws a vague error because the event list is empty:

....
max_name_length = max(len(evt.key) for evt in events)
ValueError: max() arg is an empty sequence

This change fixes the error by returning an empty string.

* Update profiler.py
2018-06-01 14:38:24 -04:00
bf29abd908 propagate nan in some activations (#8033)
* propagate nan in some activations

* fix py2 not having math.nan

* flake8
2018-06-01 14:08:01 -04:00
8b447fa784 [auto] Update onnx to 3fb9656 - Fix for fbcode CI (onnx/onnx#1062)
3fb965666e
2018-06-01 17:09:28 +00:00
d0ec8af0fc Support CUDA tensors in ProcessGroupGloo (#7694)
This adds an unconditional dependency on CUDA, which is not desirable
for the long term. Ideally we have split like ATen where we have
different artifacts for different backends so you can decide at runtime
what to use.
2018-06-01 09:54:45 -07:00
d0e27609ab [auto] Update onnx to 1504a33 - Convert schema assert for duplicate type names to exception (onnx/onnx#1057)
1504a33abb
2018-06-01 15:24:25 +00:00
03fe106448 [auto] Update onnx to 33e9cd4 - Remove the usage of default value to fix invalid proto3 files. (onnx/onnx#1052)
33e9cd4182
2018-06-01 15:23:39 +00:00
52368f25cc Example for Transformed Distribution (#8011) 2018-06-01 16:23:57 +02:00
8be17723cb Update nn.rst (#8029) 2018-06-01 09:37:18 -04:00
b41050ff66 Re-enable build env check (#7969)
* Re-enable build env check

* Fix linux test error

* Try to fix macOS test error
2018-06-01 06:57:47 -04:00
dbe5c7f6e9 Mention the pytorch-ci-hud on the README. (#8004)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-01 06:56:48 -04:00
580d212267 Remove WITH_ROCM cmake flag/variable (use USE_ROCM solely) (#8013) 2018-05-31 20:50:59 -04:00
436211e27c Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. (#7935)
* Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace.

This requires renaming the _cast functions which used the unqualified names.

* Separate onnx mapping of scalar type from cast name.

* Fix flake8.

* Properly cast onnx.
2018-05-31 20:50:16 -04:00
6c0bc27371 [auto] Update onnx to 8ec0e5f - Add index check for Transpose's type inference function (onnx/onnx#1053)
8ec0e5fe9b
2018-06-01 00:11:02 +00:00
e63be0d58f Reduce grain size for Unary operations (#8003) 2018-05-31 21:59:53 +00:00
0fe4cb10e3 Add on-stack observer cache for Observable (#7931)
observers_list_ stores all the observers for an observable. The list is allocated on heap, which
 can cause LLC miss. Add an on-stack observer cache for fast access. In production, we have seen 20%
 speed up for start and stop observer calls.
2018-05-31 13:05:02 -07:00
fd30487089 Fix a couple of typos (#7998)
* Fix typo

* Fix typo

* Fix typo

* Fix typo
2018-05-31 15:29:02 -04:00
8afe4c95d6 Entry for c10d in CODEOWNERS (#8001) 2018-05-31 15:28:16 -04:00
80ede55242 Revert "Set smaller grain size for some cases" (#7988) 2018-05-31 15:24:03 -04:00
85ee94b7be Add memory leak check in CUDA tests (#7270)
* Add memory leak check in CUDA tests

* Tracking multi-GPU too

* fix run_test.py not running __name__ == '__main__' content; add test for make_cuda_memory_checked_test

* add a comment

* skip if cuda

* 1. Change the wrapper to a method in common.py:TestCase
2. Refactor common constants/method that initialize CUDA context into common_cuda.py
3. Update some test files to use TEST_CUDA and TEST_MULTIGPU

* Fix MaxUnpool3d forward memory leak

* Fix MultiLabelMarginCriterion forward memory leak

* Fix MultiMarginLoss backward memory leak

* default doCUDAMemoryCheck to False

* make the wrapper skip-able

* use TEST_MULTIGPU

* add align_corners=True/False tests for Upsample; fix TEST_CUDNN

* finalize interface

* VolumetricMaxUnpooling_updateOutput

* fix test_nccl

* rename THC caching allocator methods to be clearer

* make the wrapped function a method

* address comments; revert changes to aten/src/THC/THCCachingAllocator.cpp

* fix renamed var
2018-05-31 15:09:54 -04:00
bafec1637e support loading gzip (#6490)
* support loading gzip

* address comments

* address comments

* fix lint

* fix test for python2
2018-05-31 15:06:38 -04:00
3481c6c5e2 Build ONNX for PyTorch version of libcaffe2 (#7967) 2018-05-31 11:57:35 -07:00
e9c33e91d9 Remove python bindings for torch.slice (#7924)
* skip python bindings for slice

* remove tests

* convert slice test to indexing
2018-05-31 13:42:49 -04:00
89ba9dc44f Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834)
* Import/export observer symbols for DLL, which fixes the linking error in Visual Studio.

* Add support of all default cmake build types for release to cuda.
2018-05-31 10:22:21 -07:00
eb39a23d8e Make THStorage / THCStorage have void* data ptr. (#7964)
* Make THStorage / THCStorage have void* data ptr.

This is the initial step in unifying the ATen and TH tensor representations, next is to only generate a single THStorage / THCStorage type.

The major changes here are:
1) data has been renamed to data_ptr and made void* in THStorage/THCStorage.
2) THStorage / THCStorage stores a at::ScalarType representing its data type (This will be useful when we generate a single THStorage/THCStorage).
3) APIs for Accessing the data as a real*:
a) storage->data<real>() -- this does runtime-type checking (checks that the at::ScalarType is correct).
b) storage->unsafeData<real>() -- as above, but no runtime-type checking (used in inner loops / fast code paths).
c) THStorage_(data)(storage) -- this already existed, just calls storage->data<real>().

* Add include.

* Attempt to fix clang build issues.

* Clarify comment and remove extra character.

* Rename unsafeData -> unsafe_data.

* Remove unnecessary 'to' function to get compile time rather than link time errors.
2018-05-31 13:10:08 -04:00
b5594ac750 Raise error when torch.load a storage on a non-existing device (#7921)
* Raise error when torch.load a storage on a non-existing device

Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine
would raise an unreadable error:

```
~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self)
    223         if self.idx is -1:
    224             return
--> 225         self.prev_idx = torch._C._cuda_getDevice()
    226         if self.prev_idx != self.idx:
    227             torch._C._cuda_setDevice(self.idx)

AttributeError: module 'torch._C' has no attribute '_cuda_getDevice'
```

This PR makes it so that torch.load raises a hard error if one tries to
load a storage onto a non-existing device and suggests the user to use
torch.load's map_location feature.

* Address comments

* missing dep
2018-05-31 09:44:50 -04:00
f9926e4ce5 Fix EmbeddingBag max_norm option (#7959)
* fix EmbeddingBag max_norm option

* flake8

* add warning to the embedding bag arg change
2018-05-31 09:42:56 -04:00
5596260b9e Add third wayt to determine IS_CONDA (#7971) 2018-05-31 09:04:27 -04:00
d8e28cfec2 Enable ONNX backend Mean tests (#7985) 2018-05-31 21:03:12 +08:00
d476d0b4ab [Hotfix] Bring back warnings and -Werror to ATen (#7866)
* Bring back warnings and -Werror to ATen

* Unbreak...

* Fix tbb errors
2018-05-30 21:59:04 -07:00
1bb6d44a21 Use Glog's implementation of STL logging when possible. (#7206)
Inject custom workaround into namespace std so that it can be found by ADL.
2018-05-30 21:10:27 -07:00
74783f0cd8 Move the broadcast check in MKL Add/Sum to runtime (#7978) 2018-05-30 21:09:32 -07:00
08b4c7ab7f Change perf test folder after git checkout (#7980) 2018-05-30 20:15:53 -07:00
108fb1c2c9 Fix the import part of the windows doc (#7979) 2018-05-30 21:51:30 -04:00
6e1de968d6 Use mingfeima's mkldnn (#7977) 2018-05-30 21:46:39 -04:00
df77ea7baf Fix the cpp libtorch CUDA build (#7975) 2018-05-30 21:27:45 -04:00
fce6b24468 Allowing MatMul to create a gradient even with 3 inputs. useful if you are differentiating a graph twice (#6536) 2018-05-30 16:53:54 -07:00
9b1abd2f81 [Caffe2] Keep name of caffe2_pybind11_state and caffe2_pybind11_state_gpu in debug build (#7155) 2018-05-30 16:38:44 -07:00
f0c09203b0 [caffe2] YellowFin parameter update GPU code fix. (#6993) 2018-05-30 16:36:08 -07:00
c94f3bbf33 Fix typo in autodiff formula for addmm (#7932) 2018-05-30 18:11:24 -04:00
2e78bfa530 Delete unused file (#7919) 2018-05-30 18:09:55 -04:00
fa8bdafa6c Prevent git autocrlf for bash scripts (#7949) 2018-05-30 18:09:10 -04:00
f721481543 Fix returning scalar input in Python autograd function (#7934)
* fix _wrap_outputs not working with scalar inputs

* add a test
2018-05-30 18:08:22 -04:00
df5d01df1e Set smaller grain size for some cases (#7941) 2018-05-30 18:07:13 -04:00
1f94a6eab3 [JIT] Fission and fusion passes for addmm (#7938)
* Addmm decomposition pass

* Addmm peephole pass

* Fix handling of output shape in fusion pass

* Add DCE to the peephole passes

* add comments

* maybe bugfix?

* Fix GPU tests

* fix py2/3 test issue
2018-05-30 18:06:58 -04:00
769f5f7cfe Handling of scalars in torch.Size (#5676)
* Handling of scalars in torch.Size

torch.Size() constructor uses python_arg_parser

IntList in python_arg_parser can take iter/range

Have IntList take python iterables and ranges.

Address comments: don't use python_arg_parser and instead call __index__ in THPSize_pynew

Address comments

Address comments

* Rebased

* Address nit
2018-05-30 17:50:32 -04:00
d102f9ea18 Split CI tests in half and run them in parallel (#7867)
* Split and run tests in parallel

* Refactor tests
2018-05-30 17:42:25 -04:00
8e6cd43291 Fix checkBackend error message (#7926)
* Fix checkBackend error message

Fixes #7849

* Switch order of printing args
2018-05-30 16:51:23 -04:00
0656ef483d remove sort requirement from pad-sequence (#7928)
* pad-sequence no longer requires sorting entries

pad-sequence can get the max_len from the list of sequences. entries only need to be sorted if output will be used for pack_padded_sequence, which can throw the error itself.

* remove sort requirement from pad-sequence

Picks up from #5974.

Removes the requirement that input sequences to pad_sequence have to be
sorted. Addressed the comments in the PR:
- Updated docstring for pad_sequence
- Remove sort requirement in pad_sequence test
- Test unsorted and sorted sequences in pad_sequence test
2018-05-30 16:36:55 -04:00
c5b895ac50 Try to fix TORCH_CUDA_ARCH_LIST for PyTorch again (#7936)
* try again

* use DEFINED

* use a loop

* Minor fixes
2018-05-30 16:30:21 -04:00
f8e83dc257 Rename cuda::type to cuda::into_type and provide cuda::from_type. (#7937)
These are used to convert Half -> half and half -> Half respectively.
from_type will be used for runtime type checking in THC.
2018-05-30 15:25:25 -04:00
5419c6ecb7 Add unsafe flag to skip checking in prepare (#7832)
* Add unsafe flag to skip checking in prepare

* pop
2018-05-30 11:48:01 -07:00
f4256c9605 cache and use BLAS_SET_BY_USER so that it doesn't set itself to TRUE when run second time (#7942) 2018-05-30 11:44:23 -07:00
c0d50e1e1f [JIT][script] Fix emitted gather and slice for dynamic indices (#7861)
* [JIT][script] Fix emitted gather for dynamic indices

* Also fix slice

* Address comments
2018-05-30 11:43:22 -07:00
795f6e1077 add test for correctness of transpose fusion (#7950) 2018-05-30 10:56:51 -07:00
b3e87b1066 Fix fbcode compatibility (#7939) 2018-05-30 13:35:46 -04:00
8858b1d519 Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952) 2018-05-30 12:55:02 -04:00
4a80755834 Split up detail.h (#7836) 2018-05-30 08:55:34 -07:00
15122e93bc Test if ASAN is actually working as part of ASAN tests. (#6050)
* Test if ASAN is actually working as part of ASAN tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Drop explicit use of libstdc++, we should not care.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Build with DEBUG=1

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Increase main thread stack size when using ASAN.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-30 11:31:42 -04:00
4e5dec3024 [auto] Update onnx to 307995b - Update from upstream (onnx/onnx#1038)
307995b143
2018-05-29 23:13:26 +00:00
38dbe6e605 Updates to caffe2 operator documentation (#7917)
* Significant updates to the operator docs in prep for merge
2018-05-29 14:38:56 -07:00
c72c083151 Moved condition for dilated grouped convolutions to CUDNN convolution implementation (#7465) 2018-05-29 22:08:41 +01:00
267fc43a96 Fix Windows doc for import error (#7704)
* Fix Windows doc for import error

* Fix doc again

* Fix wrong format
2018-05-29 22:07:00 +01:00
c2fa1f363b [c10d] MPI Process Group Implementation (#7783)
This provides a bare-minimum MPI Process Group implementation, the commit is on top of @pietern's Gloo Process Group PR.

* [c10d] MPI Process Group Implementation

ref: https://github.com/pytorch/pytorch/issues/7434

* Better exception, atexit func, and addressed comments

* Clang formatting changes

* Static initialization and addressed comments

* Added constness back

* Test will now launch mpi processes if found

* CMakeList Changed
2018-05-29 22:06:48 +01:00
a8625e016a Spelling fix in MultivariateNormal docstring (#7915) 2018-05-29 16:53:36 -04:00
0951f4424a CUDA 9.2 adds support to GCC 7.3.1. (#7880) 2018-05-29 21:53:06 +01:00
e8cc16bb92 Release GIL when copying to shared memory (#7918)
This releases the GIL when creating and copying a THStorage to shared
memory.
2018-05-29 21:51:58 +01:00
f70146e922 Fix SN not backprop via sigma(W), and not reusing W_u (#7905) 2018-05-29 15:55:29 -04:00
tvn
146b951ec5 Fix seeding random module in DataLoader (#7886)
* fix seeding random module

* make base seed int

* follow 0.4 idiom

* add a test for random seeding
2018-05-29 15:55:04 -04:00
65f8465f6f Add back cpp_build tests for Mac (#7810) 2018-05-29 12:54:12 -07:00
a0480adc79 Fix file extension (#7852) 2018-05-29 15:52:31 -04:00
1ce7ed2895 fix slack email link 2018-05-29 15:51:22 -04:00
f7458faf98 Only add BUILD_ATEN/USE_ATEN once to flags (#7845) 2018-05-29 12:21:11 -07:00
5c1fcea5db [auto] Update onnx to 7361eec - Fix Operator Tests (onnx/onnx#1044)
7361eec59a
2018-05-29 19:04:36 +00:00
215fe057ea No Default argument to max_unpool functions (Fixes #7327) (#7388)
* Fix for Issue #7327

* Added testcase for max_unpool
2018-05-29 15:02:23 -04:00
49f8581745 Update from facebook (#7855)
* [mpscnn] MPSCNNChannelShuffle

att

* [Easy] Adding tags as an argument to the functional layer

Without it "tags" would be added as an argument to the operator.

The change here is based on the assumption that there is no operator that takes "tags" as an argument.

* Fix locally_connected_op schema check.

Fix locally_connected_op schema check.

* [C2] Add TypeAndShape inference for few more operators

As desc

* [c2] Shape inference should support 0 as dimension

Tensors can have 0 in their dimension.

* Make MockHiveReader loop over and support max_examples

Replace DatasetReader with RandomDatasetReader.

So that Mock Hive Reader can simulate a large data input using a small sample file as source.

* Utility function to wipe cache between benchmark runs

Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache.

* Allow caffe2 GlobalInit to be invoked multiple times

Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization.

* Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes

Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG

* Rethrow current exception on failure

Rethrow current exception instead of copy constructing a new one on op failure.

* Make `clone()` return subclass of List/Struct

`clone()` is not working correctly when we subclass those classes

* Wipe the cache before the net run

the util function is copied from D7409424
will rebase once D7409424 is landed.

* [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds

* Correct includes

async_polling include -> async_base include

* Prepare execution flags for executor migration

Making async_scheduling aware of underlying net type to prepare for executor
migration

* Add operator level observers into async executor

Adding operator level observers into RunAsync operators' calls

* Cleanup TEST_Benchmark

Remove duplicate code and provide default implementation in NetBase

* [C2] Fix type and shape inference for binary comparison ops

As desc.

* Add GlobalInit to predictor to ensure initialization is always done before prediction

FACEBOOK:

Redo D7651453 the correct way.

Now use a static variable for the arguments passed to GLog

* Remove spammy log message

This method is currently used in various places inside Caffe itself.

* Disable events for operators inside a chain

We don't need to use events in operators within a chain because the chain is
always scheduled on a single stream, keeping only first and last event for
scheduling purposes

* Ensure correct finish run order

In rare cases we might call finishRun and trigger net's destruction while
another worker is still holding shared_ptr to a thread pool, that can cause
thread pool destruction from within a worker thread in case no other nets are
using the pool. This diff fixes the order of calling finishRun and also changes
pool() to return raw pointer to keep pool's ownership within the net

* Reduce unnecessary polling

Make sure we don't waste CPU by polling operators that we can set an efficient
callbacks on

* Squash commit of syncing 9506eeb from github to fbcode

Patch xplat buck fix

add virtual destructor to OptimizationPass

add virtual destructor to OptimizationPass

build fixes for sync

build fixes for sync

* Fix net tracing

Fix net tracing from async_scheduling

* Fix logging
2018-05-29 11:38:02 -07:00
9f21ec7ca2 Add spaces to indexing error message (#7922)
Followup to #7345
2018-05-29 13:10:06 -04:00
637a044a24 Add missing ${generated_comment} (#7920) 2018-05-29 13:08:05 -04:00
bc8a92d03d Move REGISTER_CUDA_HOOKS to cpp file. (#7630)
It's going to define a static variable, and this was a loaded
footgun if another C++ file directly included this header.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-29 17:40:50 +01:00
fb59ce32c8 [auto] Update onnx to 385523b - Eliminate unused initializer (onnx/onnx#860)
385523bf1c
2018-05-29 13:55:52 +00:00
6dfadfeb89 Revert "Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST" (#7914)
* Revert "Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST (#7879)"

This reverts commit 45cdb63d8b8022ab26f073d3bed718e75d2aedaf.

* Disable dirty test; always run all CI runs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-29 14:48:18 +01:00
42a68749bf einsum: don't inplace modify arguments (fixes: #7763) (#7765)
Thank you, Pierce Freeman, for the report and minimal example!
2018-05-29 11:26:39 +01:00
fb23e62797 Remove templatization of PyTypeObject in THP copy storage methods. (#7811)
* Remove templatization of PyTypeObject in THP copy storage methods.

An in-progress refactoring of THStorage is collapsing the types of THStorages to not be ScalarType-specific.
The revelant PyTypeObject to use for the THPStorageType is currently templatized based on the current THStorage;
this doesn't work if the ScalarType is collapsed.  Instead, just pass it explicitly.

* Pass src type instead of dst type.

* Line up columns.
2018-05-29 11:19:34 +01:00
8b85b8afd7 Avoid @generated in templates. (#7858)
* Avoid @generated in templates.

We want @generated only in the build products. Otherwise, templates are
locked and changes to the templates are excluded from phabricator.

Also adds @generated to autograd generated files (e.g.
VariableType.cpp).

See #7780

* Don't try to specify the template filename in generated comment

The template filename is not always the same as the generated filename.
2018-05-29 11:18:31 +01:00
07f55ae568 [auto] Update onnx to e570127 - update version (onnx/onnx#1034) (onnx/onnx#1041)
e5701271f0
2018-05-29 05:52:05 +00:00
c122d271a8 [caffe2][nomnigraph] Default disable optimization passes (#7741) 2018-05-28 15:49:25 -07:00
45cdb63d8b Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST (#7879) 2018-05-26 17:20:46 -04:00
07a0482d80 Make size_average docs clearer (#7829)
* Make size_average docs clearer

* fix format
2018-05-26 11:18:57 -04:00
7cd1ea8166 Make TensorMethods (fastGetSet) not depend on data type of Storage. (#7859)
* Make TensorMethods (fastGetSet) not depend on data type of Storage.

Currently, fastGetSet is implemented as macros that depend on the data type of Storage (i.e. that storage->data is real*).
Since we are moving to having 'void*' data this won't work in the future.

Also, due to the recentl C/C++ split, these are actually C++ implementations (because they require the struct definition which is C++),
so we move them to a generic .hpp file and implement them as static inline functions.

* Fix set functions.

* Add generic to CMakeLists.
2018-05-26 11:17:40 -04:00
5e50993be7 Better type checking for pack_padded_sequence symbolic (#7874) 2018-05-26 11:16:41 -04:00
af3d0e20a0 [ONNX] Fix transpose fusion logic (#7872) 2018-05-26 11:13:15 -04:00
f0dc40f77e Fix typo (#7876) 2018-05-26 11:11:19 -04:00
fece8787d9 [auto] Update onnx to 789efb1 - update proto files. (onnx/onnx#1040)
789efb166d
2018-05-26 09:40:53 +00:00
d8101e8410 [Caffe2] Fix roi_align_op_gpu_test and test_layer_norm_grad_op (#7875)
* Fix roi_align_op_gpu_test

* Fix layer_norm_op_test.py::TestLayerNormOp::test_layer_norm_grad_op
2018-05-26 02:28:48 -07:00
5c8d48c457 Properly pass xml report flags to ATen tests in Caffe2 builds (#7863)
* Not running ATEN tests on Caffe2 builds

* Keeping test directory when only aten is built

* Changing to run all aten tests too

* Skipping directories again

* .

* .

* skip aten/integer_divider_test (it hangs for unknown reason)
2018-05-25 23:21:40 -07:00
06d5dd088d [auto] Update onnx to ec3b679 - Re-enable mypy, Fix releasing from Windows (onnx/onnx#1037)
ec3b6797b7
2018-05-26 00:41:46 +00:00
74246c9ba4 Potential fix for RNN test on MKL (#7862) 2018-05-25 16:16:46 -07:00
aae0ad58f3 Fix onnx integration tests build issues (#7856)
* Fix onnx integration tests build issues

* set -DBUILD_SHARED_LIBS=OFF for integrated builds

* verbose log

* non-local protobuf

* .

* turn back off verbose logging

* Fix typo
2018-05-25 15:19:54 -07:00
14f8cd7e3d [JIT][script] Implement nn.Sequential that can be inlined into script modules (#7747)
* Implement nn.Sequential that can be inlined into script modules

* fix bugs

* add comment

* add _ConstSequential class

* add script_method for forward in ConstSequential

* fix build bug

* refactor
2018-05-25 13:38:24 -07:00
c5b623e5d1 Use __float2half (#7850) 2018-05-25 13:25:56 -07:00
d5c466e5ce RNN export: add transpose to match onnx spec (#7825)
Didn't quite get it right the first time.

fixes https://github.com/pytorch/pytorch/issues/7817
2018-05-25 12:56:57 -07:00
e6488bbd01 add jit/passes/onnx CODEOWNERS line (#7853) 2018-05-25 15:52:39 -04:00
d2f98fcae9 Fix perf commits (#7848) 2018-05-25 17:42:47 +00:00
b1d03b795a add launch bounds to im2col and col2im (#7779) 2018-05-25 12:25:49 -04:00
0f7f27a843 fix typo from #7399 (#7846) 2018-05-25 12:11:50 -04:00
bed0ec3b21 Add missing trailing underscores 2018-05-25 08:27:25 -07:00
8d0622ca9d re-fix 9.2 build (#7828) 2018-05-25 11:13:20 -04:00
fb5cc630f6 Fix me (#7837)
* Mini fix

* No USE_MKL

* Add CAFFE2_USE_EIGEN_FOR_BLAS
2018-05-25 07:38:50 -07:00
d7c32df67f move Subset, random_split to data, use sequence at some places. (#7816) 2018-05-25 12:50:50 +02:00
ce1a65b5c2 [auto] Update onnx to 94dbb76 - Fix comma in Gemm description (onnx/onnx#1032) (onnx/onnx#1035)
94dbb76747
2018-05-25 03:41:34 +00:00
93b7b5dddd Fix trigonometric_op_test failures when running in python3.6 (#7831) 2018-05-24 19:09:35 -07:00
dbac3d21f6 [auto] Update onnx to b18cbd3 - remove mypy which blocks release. (onnx/onnx#1031)
b18cbd3364
2018-05-25 01:29:25 +00:00
28b1a3852c Add backward() to Tensor and Variable (#7774)
* Add backward() to Tensor and Variable

* Add at:: in front of Tensor

* Trying to not move optional to appease windows?

* Move implementation into cpp file

* Undo some formatting changes
2018-05-24 17:31:41 -07:00
147cc05cf5 [auto] Update onnx to 8236f49 - Kezhan/update manifest (onnx/onnx#1029)
8236f49124
2018-05-24 23:00:38 +00:00
144c5d1ff3 Overwrite INTEL_MKL_DIR correctly (#7824) 2018-05-24 15:04:25 -07:00
2271e7d7ab onnx->caffe2 output: better handling of init/pred splitting (#7820) 2018-05-24 14:49:14 -07:00
71bad33cc4 Match parenthesis (#7797) 2018-05-24 13:45:23 -07:00
b12164005f [C++ API] Remove virtual forward and implement Sequential based on Any(Module) (#7508)
* Remove virtual forward

* Rebase
2018-05-24 12:46:51 -07:00
1078491502 Change is_tensor to isinstance(*, torch.Tensor) (#7814)
Thanks!
2018-05-24 15:08:16 -04:00
0fddfe6c21 [auto] Update onnx to f8aa447 - update version number (onnx/onnx#1027)
f8aa447431
2018-05-24 18:14:57 +00:00
9a736f5228 [auto] Update onnx to 640a4ec - [Easy] Fix the gen_doc.py (onnx/onnx#1024)
640a4ec5d2
2018-05-24 15:33:02 +00:00
f88c529d06 [auto] Update onnx to 5591c95 - Enable non-static schema registration (onnx/onnx#894)
5591c95f68
2018-05-24 15:31:45 +00:00
c946db16ec [distributions] Always enable grad when calculating lazy_property (#7708)
* Always enable grad when calculating lazy_property

* Add test with MultiVariableNormal
2018-05-24 11:22:39 -04:00
4bf0202cac [build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399)
* Have PyTorch depend on minimal libcaffe2.so instead of libATen.so

* Build ATen tests as a part of Caffe2 build

* Hopefully cufft and nvcc fPIC fixes

* Make ATen install components optional

* Add tests back for ATen and fix TH build

* Fixes for test_install.sh script

* Fixes for cpp_build/build_all.sh

* Fixes for aten/tools/run_tests.sh

* Switch ATen cmake calls to USE_CUDA instead of NO_CUDA

* Attempt at fix for aten/tools/run_tests.sh

* Fix typo in last commit

* Fix valgrind call after pushd

* Be forgiving about USE_CUDA disable like PyTorch

* More fixes on the install side

* Link all libcaffe2 during test run

* Make cuDNN optional for ATen right now

* Potential fix for non-CUDA builds

* Use NCCL_ROOT_DIR environment variable

* Pass -fPIC through nvcc to base compiler/linker

* Remove THCUNN.h requirement for libtorch gen

* Add Mac test for -Wmaybe-uninitialized

* Potential Windows and Mac fixes

* Move MSVC target props to shared function

* Disable cpp_build/libtorch tests on Mac

* Disable sleef for Windows builds

* Move protos under BUILD_CAFFE2

* Remove space from linker flags passed with -Wl

* Remove ATen from Caffe2 dep libs since directly included

* Potential Windows fixes

* Preserve options while sleef builds

* Force BUILD_SHARED_LIBS flag for Caffe2 builds

* Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing

* Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake

* Fixes for the last two changes

* Potential fix for Mac build failure

* Switch Caffe2 to build_caffe2 dir to not conflict

* Cleanup FindMKL.cmake

* Another attempt at Mac cpp_build fix

* Clear cpp-build directory for Mac builds

* Disable test in Mac build/test to match cmake
2018-05-24 07:47:27 -07:00
f9633b9542 [Caffe2] Skip some tests to unbreak CI (#7804)
* Skip some tests to unbreak CI

* Pass the opset_version to run_node

* Remove the stale check_graph call, caffe2_net_to_onnx_model will invoke check_model
2018-05-24 00:12:00 -07:00
fdabc02644 [auto] Update onnx to 9e6e7e4 - Support opset_version in run_node (onnx/onnx#1022)
9e6e7e4282
2018-05-24 06:10:07 +00:00
cfd70dc1cf [C++ API] Back to reset() and fixed in-place cloning (#7796)
* Back to reset() and fixed in-place cloning

* Add final override to clone_
2018-05-23 22:11:32 -07:00
6df371ba2f [auto] Update onnx to 9a37d4d - Add PRelu test cases (onnx/onnx#580)
9a37d4daf5
2018-05-24 01:51:36 +00:00
43d87afdc2 [auto] Update onnx to d2a46da - fix gru, rnn, lstm test cases to match the specification and add some cases (onnx/onnx#920)
d2a46da13b
2018-05-24 01:50:51 +00:00
1289fc870d Disable onnx backend node tests with broadcasting (#7730) 2018-05-24 09:15:16 +08:00
966c65859d Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2" (#7802)
* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) 4898c9e925"

This reverts commit 9c679dab5fe7cac27bb8c783fd143276e6046ef1.

* Revert "Add BiasCHW fallback for GPU (#7738)"

This reverts commit 14ad2e74f108d13ec98abb078f6aa7f01aae0aad.

* Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)"

This reverts commit 2ebcf4bb37739733e76b754284cf8b2ffcba1c30.
2018-05-23 17:58:47 -07:00
9c679dab5f [auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879)
4898c9e925
2018-05-24 00:31:58 +00:00
14ad2e74f1 Add BiasCHW fallback for GPU (#7738) 2018-05-23 16:04:35 -07:00
2ebcf4bb37 [Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)
* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.
2018-05-23 15:13:09 -07:00
4352eab367 Call grad_mode.py context managers as decorators (#7737)
* call grad_mode.py context managers as decorators

* flake fixes

* switch to using context manager in wrapper

* fix set_grad_enabled test

* removed dumb github UI whitespace

* revert set_grad_enabled to normal, update tests
2018-05-23 17:39:13 -04:00
aa214a8b8c catch CPU tensors in checkSameGPU (fixes #7689) (#7767)
Thank you, Nikita Kitaev, for the report and example.
2018-05-23 17:28:37 -04:00
0e9613cc49 Mark stack as non-executable in NNPACK (#7752)
Pull new revision of NNPACK which specifies non-executable stack in assembly files. Previous revision didn't do that, and depending on toolchain could cause linker to mark stack as executable for the linked binaries.
2018-05-23 12:50:07 -07:00
1feb1a9b88 small fixes in fusion_compiler (#7776)
* small fixes in fusion_compiler

* address review comments
2018-05-23 15:18:58 -04:00
7d0de4f138 Run clang-format on c10d (#7791) 2018-05-23 11:26:35 -07:00
42134ee799 Allow empty storage for the 'Edge' class. (#7595)
This commit:
- Converts edge storage to an optional type.
- Adds a new test in tarjans_test.
- Refactors related bits in other files.
2018-05-23 10:40:29 -07:00
ee5e474fcf Process group base class and Gloo implementation (#7628)
This is a starting point and only implements allreduce for CPU tensors. It includes most base functionality like algorithm caching (similar approach as taken in the THD GlooCache) and multi-threaded execution (new).

The expectation is that function calls on the process group class are globally serialized. They execute collective functions, so members of the collective must call the same functions in the same order, or a deadlock may happen.

The algorithm cache works as follows: the ProcessGroupGloo class has a cache map from algorithm keys to algorithm entries. The algorithm key is a struct with fields that make up the signature of a collective function. It includes the dimensionality of the input/output tensors, tensor device assignment, source/destination rank, etc. For collective calls with the same key, the process group will lazily initialize and then cache a Gloo algorithm instance. For now we only keep a single algorithm instance per key, but this may be revisited in the future, if we observe contention on a single key and can exploit additional parallelism.
2018-05-23 09:02:18 -07:00
5a3f7810f8 _LRSchedulers getstate include optimizer info (#7757)
* getstate should include optimizer

* remove getstate/setstate functions
2018-05-23 11:43:42 -04:00
e3e15b5d95 [PyTorch] [gradcheck] change backward() to grad() (#7710)
* Change backward calls to grad to avoid memory leak from #7343; Replace unnecesary create_graph=True with retain_graph=True

* fix gradgradcheck use of make_non_contiguous

* allow non-contguous target

* remove unnecessray .grad.zero_()

* remove contiguous_detach

* fix PReLU double backward always returning ggW as a scalar

* let noncontig gO require grad

* move requires_grad to return
2018-05-23 11:03:12 -04:00
6a604f16cc Update test_nn.py (#7787) 2018-05-23 12:28:13 +02:00
60d5c0eb19 Define general default scheduler for TBB and fix ppc64le bug (#7761) 2018-05-23 12:24:33 +02:00
2222fc7666 Add support for accepting Tensor as input in clip_grad_* functions. (#7769) 2018-05-23 12:12:03 +02:00
5316cad5c2 [Easy] Remove unused code (#7782) 2018-05-22 22:32:47 -07:00
85e9ae20e5 Update tbb (#7734) 2018-05-23 01:54:16 +00:00
f534339a1a Add @generated annotation (#7780) 2018-05-22 18:33:05 -07:00
ee628d64b9 fix legacy comment after variable tensor merge (#7771) 2018-05-22 19:08:42 -04:00
60745b3380 Revert #7750 and #7762 to fix Windows CI on master (#7772)
* Revert "Add missing brace (#7762)"

This reverts commit ea27c5af50f6bc8ba82068e6d36ade9c773dc101.

* Revert "[C++ API] Add backward() to Tensor and Variable  (#7750)"

This reverts commit 1e2762796f33123d86782936089dbeda37bdcc92.
2018-05-22 15:42:52 -07:00
8d91a602cc Temporarily disable build env check (#7768) 2018-05-22 12:51:00 -07:00
ea27c5af50 Add missing brace (#7762) 2018-05-22 14:18:22 -04:00
1e2762796f [C++ API] Add backward() to Tensor and Variable (#7750)
* Add backward() to Tensor and Variable

* Added a couple tests
2018-05-22 10:43:04 -07:00
e5b830eb0e [auto] Update onnx to d43b550 - Fix .gitignore and add missing files (onnx/onnx#1005)
d43b55087d
2018-05-22 17:40:43 +00:00
bb15a0830d [auto] Update onnx to ea1aa13 - add tests for reduce ops (onnx/onnx#675)
ea1aa139b2
2018-05-22 01:50:13 +00:00
Ben
bb34887ae3 include cudnn_h (#7749) 2018-05-21 21:48:50 -04:00
549b4069bb [C++ API] Using new registration mechanism (#7663)
* Using new registration mechanism

* Fix signature of param() in module.cpp

* Remove ParameterList

* Fix tests
2018-05-21 17:59:21 -07:00
312ab535ba [auto] Update onnx to 5dd68e6 - Add a util function: polish_model (onnx/onnx#1000)
5dd68e634b
2018-05-22 00:22:30 +00:00
8275e430b0 [auto] Update onnx to 169b156 - Add more missing type hints (onnx/onnx#991)
169b1561e9
2018-05-21 22:08:15 +00:00
f01be11efd [auto] Update onnx to b3b3b28 - Enable checking for functions that don't have a type hint (onnx/onnx#989)
b3b3b2851a
2018-05-21 19:18:19 +00:00
c5ffc3a02c [auto] Update onnx to 9f9316a - Catch up with type hints (onnx/onnx#988)
9f9316a5e2
2018-05-21 19:17:25 +00:00
d02b7ab389 [auto] Update onnx to c168303 - Better error message if protoc isn't found (onnx/onnx#1004)
c168303031
2018-05-21 18:38:53 +00:00
9506eeb73a [auto] Update onnx to 52f7528 - add more shape inference tests (onnx/onnx#971)
52f75285ad
2018-05-21 17:09:03 +00:00
286cd04a20 JIT cleanup (#7631)
Cleans up dead code in the JIT:

* Remove interpreter_autograd_function
* Remove Handles
* Remove HandleBuilder
* Remove creates_handles, and tracing_autograd_python_function flags
* Remove unused var_args
* Fix submodules
2018-05-21 10:06:29 -07:00
e6f7e1807d fix to build sleef when using cmake 3.11.1 (#7679) 2018-05-21 15:13:17 +00:00
5ee5537b98 Fix typo in document (#7725) 2018-05-21 11:10:24 -04:00
28b592e00b [auto] Update onnx to 6f4b1b1 - Tests for Gemm operator (onnx/onnx#885)
6f4b1b12e5
2018-05-21 12:22:11 +00:00
987b52460d [auto] Update onnx to c6c6aad - Enhance the 1-element broadcast case (onnx/onnx#902)
c6c6aad416
2018-05-21 11:29:53 +00:00
b4ae80d459 serialization for torch.device (#7713) 2018-05-21 11:34:26 +02:00
ee6e3fe301 Fix compile flags for MSVC (#7703) 2018-05-21 10:20:19 +02:00
0a11018db6 Fix exporting Sum to onnx (#7685)
* Fix exporting Sum to onnx

* extend fix to prod and mean

* update expect file
2018-05-20 23:37:42 -07:00
a890a0be07 Renanme ZFNet to ZFNet512 (#7723) 2018-05-21 11:37:39 +08:00
75cf0faf4c Implement __reduce__ for torch.dtype (#7699) 2018-05-20 14:59:02 +02:00
5000a05724 Remove unnecessary include in vec256_float.h (#7711) 2018-05-20 11:23:43 +02:00
f94ae3ba1d Update from facebook (#7696)
* Fix handling of empty batches in SumReduceDimsOp

As titled

* Deferrable async_scheduling finishRun fix

Proper order of finishing run operations in deferrable_async_scheduling net

* Simplify exception handling in async_scheduling

Simplify exception handling, no need to busy wait, thread that processes the
last task can finish the run

* [C2]worker_coordinator_memorize_worker_ids

As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids

* Add unit test for nets with no type set

* Ignore total length argument in sympolic_pad_packed_sequence

1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence)
2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp

* Add support for MKLDNN to async_scheduling

Just add MKLDNN as a possible CPU option to async_scheduling's pool function

* [AuFL][ensemble] support branch output for prediction

This diff supports using predictions from different branches and thus enables model ensembling (not fully independent).

* Fix a bug in add_loss in layer_model_helper

As titled.

* Support lradaption for adam

1.lr adaption operator
2.apply to dense adam

* Perf tweaks for async_scheduling

Restore single pool option + remove unnecessary (no-ops) calls

* add quantization to SparseSimdAdagradOp

add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next

* [sr] [codemod] Change all SR callsites to use new API

@allow-large-files

This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`.

```
cd ~/fbsource/fbcode
find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \;
```

Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes).

* Back out "Fix handling of empty batches in SumReduceDimsOp"

Original commit changeset: 282da1730cc2 This commit is blocking the
Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this
diff depends on will be reverted in the sync D7990948 which causes this to
break. The sync diff cannot be patched with this reversion because it must be
landed against base revision 5c8c099 , and D7881937 must not be included in the
sync diff because it is breaking GPU tests that are not available in sandcastle
: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console
for one example.

* Add the flow to support operator benchmark

1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark

* [tum][gpu] Connect DPM trainer with flow and unit tests

This diff:
- Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases.
- Add correct tags to the TUM code, so we can do data parallel transform
- pass extra info when instantiation.
- add unit test for using DPM in TUM model

After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking.

* w/o normalized lradaption for adam dense only

The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step

* [fb] Use SharedPromise in DeferrableAsyncSchedulingNet

This code is to simplify DeferrableAsyncSchedulingNet by removing condition
variable + small fixes

* [tum] implement cuda sparseLengthsMean and LengthsMean

as title

* Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function.

Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function.

* Move feature_to_index to FeatureSpec.feature_to_index

move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields

* [Caffe2] Rename bytes_moved to bytes_written

Just a rename in preparation for supporting bytes_read.

* [c2] fix ReduceFrontSumOp for empty case by setting 0

otherwise, it may use the results from last iteration when it's empty batch.

* [Caffe2] [Int8] Improve Intel CPU performance

* [Easy] Improve PrependDim op logging

as titled

* DBFileReader expand db_path using os.path.expanduser(..)

Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves.

* [Caffe2] Add bytes_read to cost structure

We're adding analytical read bytes to cost functions.  This extends the structure accordingly for all CostInference defined operators.
Additionally, some small bug fixes were performed:
1) Cost functions now extract type information of operands instead of assuming float

* Fix sleef on aarch64 for hhvm

@bypass-lint

Rename flag

* Remove duplicated part in caffe2/ideep/operators/conv_op.cc

should be sync error

* Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest
2018-05-19 23:10:48 -07:00
2cb096ada8 fix for cuda 9.2 builds (#7709) 2018-05-19 21:18:48 -07:00
42e5e12750 make BatchSampler subclass of Sampler, and expose (#7707) 2018-05-19 21:29:03 +02:00
cf9b80720d Dont emit warning for ABI incompatibility when PyTorch was built from source (#7681) 2018-05-19 20:25:52 +01:00
8f97cbcf4e remove index from python bindings (fixes: #7639) (#7690) 2018-05-19 20:04:07 +02:00
ee882eae8e Update _torch_docs.py (#7700)
Added better example
2018-05-19 11:12:02 -04:00
48bf733480 Changes from D7881937 and D7963936 plus an edit (#7605)
* Changes from D7881937 and D7963936 plus an edit

* D8038158

* Another change from cxj
2018-05-18 20:59:16 -07:00
77fe4bd0b6 [auto] Update onnx to 241a350 - Type and shape inference for RNN, LSTM, GRU (onnx/onnx#937)
241a350272
2018-05-19 02:04:55 +00:00
f7d96a367b Update NNPACK and cpuinfo submodules (#7691)
Updated NNPACK to 42d9355
Updated cpuinfo to 1e6c8c9
2018-05-18 18:27:56 -07:00
ec71c689fc [JIT][script] Add matmul(@), pow(**) operator (#7648)
* add matmul(@), pow(**) operator

* fix bug(matmul not in py2) in @ operator

* fix bugs

* add get_fn help func to remove duplication in test_jit
2018-05-18 15:24:20 -07:00
27ea7148fe Updates to .clang-format (#7683)
1) No longer compact namespaces (revert from #5127)
2) Don't break on return type for long function declarations
2018-05-18 15:11:17 -04:00
2f5494ac14 [auto] Update onnx to a75fa2c - fix customized version of find Protobuf for Error when calling find_package(Protobuf) twice (onnx/onnx#901)
a75fa2c402
2018-05-18 18:43:01 +00:00
875a5dceb0 [auto] Update onnx to 55fff7b - python setup.py typecheck (onnx/onnx#972)
55fff7b796
2018-05-18 18:24:01 +00:00
4f20a0e439 Fix various sparse transpose issues; remove dead code from Declaratio… (#7200)
* Fix various sparse transpose issues; remove dead code from Declarations.yaml.

1) Fixes some checks in t_, transpose_ that don't allow transposing empty sparse tensors.
2) Remove out= variants from docs since they don't exist (and haven't since at least v0.3.1).
3) Unify implementations of t_, transpose_, t, transpose.
4) Move dead checking code from Declarations.cwrap to actual implementations.
5) Fix test which never tested transpose_.

* Add test for error with t, t_.

* Address review comments.

* Fix jit tests.

* Fix test_jit.
2018-05-18 19:51:41 +02:00
7abdc303c6 Don't allow requires_grad to be set on integer Tensor constructors in… (#7185)
* Don't allow requires_grad to be set on integer Tensor constructors in tensor_new.

* Fix autograd test.

* Fix test_distributions.

* Fix test_jit.

* Fix NN tests.
2018-05-18 19:45:10 +02:00
431c80a128 Guard sleef for AVX/AVX2 (#7678) 2018-05-18 17:33:21 +00:00
cf0c585b6a [auto] Update onnx to e050bcc - add multinomial op to ONNX (onnx/onnx#897)
e050bccacb
2018-05-18 17:17:17 +00:00
32b23a4bfc Throw error on tensor creation when sequence shape cannot be determined (#7583)
* first commit

* unit test

* minor style edits
2018-05-18 19:14:42 +02:00
e37da05bd5 Expose documentation for random_split (#7676)
Fixes #7640
2018-05-18 17:16:25 +02:00
8212f576db improve RNN docs (fixes #3587) (#7669) 2018-05-18 16:41:03 +02:00
f7bc7007d4 return nan in max_pool/adaptive_max_pool for nan args (#7645) (#7670) 2018-05-18 16:39:41 +02:00
bf95dff85b Map digamma +/-inf results to nan in test (fixes #7651) (#7665) 2018-05-18 16:35:00 +02:00
50d8473ccc Document dtype arg for reduce ops (#7654)
Fixes #7039.
2018-05-18 10:30:38 -04:00
c46a0c8813 add back Tensor.permute docs (#7652) 2018-05-18 10:29:43 -04:00
56e7a2cde1 Better support for adding zero-filled sparse tensors (#7479)
Right now, if we add a zero-filled sparse tensor with another sparse
tensor, both tensors must have the same "density" (dimI, dimV) and size
(tensor.size()) for them to be added successfully. This relaxes that
constraint so that if both tensors have the same tensor.size() and at
least one is zero-filled, they can be added successfully.

Before:
```
i = torch.LongTensor([[0, 1, 1], [2, 0, 2]])
v = torch.FloatTensor([3, 4, 5]).unsqueeze(1)
sparse_mat = torch.sparse.FloatTensor(i, v, torch.Size([2,3,1]))
zeros = torch.zeros(sparse_mat.size(), layout=torch.sparse_coo)
sparse_mat + zeros

RuntimeError: cadd operands have
incompatible sizes or dimension types
at
../src/THS/generic/THSTensorMath.c:126
```

After: no error.
2018-05-18 10:29:27 -04:00
f12b8770cd use matching tp_name for torch.device (#7673) 2018-05-18 16:24:21 +02:00
c58893eb9e [auto] Update onnx to 59b0b24 - Clarified description of Pad attribute (onnx/onnx#962)
59b0b24643
2018-05-18 13:09:51 +00:00
06fa332e2b Fix UB when converting negative floating values to uint8_t (#7644) 2018-05-18 11:02:00 +02:00
4dd0aab33c [auto] Update onnx to 3fc5f43 - move finalize function to be public. (onnx/onnx#987)
3fc5f43e91
2018-05-18 08:07:12 +00:00
47ab3f936b [caffe2] Fix warning in net_async_tracing.cc (#7646)
Compilers used to report a warning:
  caffe2/core/net_async_tracing.cc: In member function 'void caffe2::tracing::Tracer::renameThreads()':
  caffe2/core/net_async_tracing.cc:210:32: warning: overflow in implicit constant conversion [-Woverflow]
     const long numa_multiplier = 10e9;

This patch fixes it.
2018-05-17 22:36:54 -07:00
ca860907bb [auto] Update onnx to 8d548e2 - Update shape inference methods to throw exception (onnx/onnx#986)
8d548e2361
2018-05-18 04:42:23 +00:00
bc4feab3e3 Fix flaky atomic iter test (#7649) 2018-05-17 21:17:29 -07:00
5207998fc3 Fix onnx Pow export (#7657) 2018-05-17 21:15:04 -07:00
93f8d98027 [auto] Update onnx to 8356ad5 - Add unit test framework for the project C++ APIs (onnx/onnx#763)
8356ad54e9
2018-05-17 23:52:58 +00:00
2d313276b2 [caffe2][nomnigraph] Add registry for optimization passes (#7656) 2018-05-17 16:33:56 -07:00
8c0299b5e6 [auto] Update onnx to 94ca052 - Update mypy version (onnx/onnx#968)
94ca052447
2018-05-17 22:52:09 +00:00
d4f6c84041 fix nccl distributed documentation 2018-05-17 18:03:54 -04:00
f2295494af Makes AccumulateGrad high priority in backwards passes (#7604)
* Makes accumulate_grad functions high priority in backwards passes

* Delegating constructor and comments

* Sequence_nr ain't pretty no more

* Sequence_nr ain't pretty no more
2018-05-17 23:49:15 +02:00
cba19e59ca [C++ API] Implement builder style construction (#7597)
* Implemented fused builder based construction mechanism

* "weights" -> "weight"

* Use int64_t instead of size_t everywhere in RNN

* Extracted Conv::ExpandingSize into its own thing

* Rename TORCH_PARAMETER to TORCH_ATTR

* Added documentation

* Fix weight names in batchnorm module
2018-05-17 17:10:15 -04:00
0d27d2686c C10D: Added TCPStore to support C10D store interface (#7560)
Reference: https://github.com/pytorch/pytorch/issues/7434

* C10D: Added TCPStore to support C10D store interface

* Used pipe to terminate the store daemon and addressed all comments

* Used notify/wake for wait and addressed all comments

* Clean up nits

* Clean up all socket states when the socket is closed
2018-05-17 13:38:06 -07:00
ec42a11410 [auto] Update onnx to ba86ec2 - Protobuf typing (onnx/onnx#982)
ba86ec2682
2018-05-17 18:29:16 +00:00
562d9971c9 Add LBFGS optimization algorithm to C++ API (#7596)
* Adding LBFGS to cpp API

* Adding stop conditions

* Test cases now passing and adding closure to all algs

* Addressing code review

* Set seeds to make optim tests more deterministic
2018-05-17 14:03:08 -04:00
84730aa659 support <= and >= (#7633) 2018-05-17 10:01:29 -07:00
f7f95f1742 Reduce gen_jit_dispatch options (#7562)
* Reduce gen_jit_dispatch options

This removes the power set of options generated for IntList[k] arguments
in aten_dispatch. Instead, the compiler now performs the broadcast using
schema information. This substantially cuts the compile time for aten_dispatch.cpp
2018-05-17 10:00:35 -07:00
331a04d8eb [auto] Update onnx to 321d874 - update output shape of RNN ops according to ONNX spec (onnx/onnx#923)
321d87457f
2018-05-17 05:47:23 +00:00
77e8a23a29 [auto] Update onnx to a8b3316 - add exception mechanism for use in type and shape inference (onnx/onnx#983)
a8b3316cff
2018-05-17 04:41:12 +00:00
9a1a20cb33 [auto] Update onnx to 13196bf - Shape inference for ConvTranspose (onnx/onnx#973)
13196bf40b
2018-05-17 03:58:54 +00:00
b4d5e67e5f Add asin, acos, tan, atan operators (#7600) 2018-05-16 18:09:26 -07:00
221e615665 Move bernoulli further into ATen (#7578) 2018-05-16 23:20:40 +00:00
330a72581f Update README to contain instructions on how to install mkldnn for Linux (#7625) 2018-05-16 19:08:03 -04:00
3c9ded098d [auto] Update onnx to 83f3666 - Spec clarity: Versioning (onnx/onnx#931)
83f366619e
2018-05-16 22:29:20 +00:00
3238db6247 Show skipped distributed tests as skipped (#7624)
Previously, tests that have been skipped because their backend was
missing would show up as succeeded, which has been very confusing.
2018-05-17 00:23:46 +02:00
8f42bb65b3 Be more lenient w.r.t. flag processing in C++ extensions (#7621) 2018-05-16 18:17:18 -04:00
f87091636d Update .gitignore (#7622) 2018-05-16 18:10:35 -04:00
8f6f43f5cf Fix rocm docker images environment variables round 2 (#7626) 2018-05-16 14:40:07 -07:00
599d0fac93 Reduce MAX_JOBS for gcc 7.2 build (#7618) 2018-05-16 17:30:09 -04:00
64cb4fb13d Add ChannelShuffle to IDEEP fallback (#7623) 2018-05-16 14:02:27 -07:00
c3a02fd8ed Conditionalize all of conv_op_eigen on version (#7581)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-16 14:17:25 -04:00
b45f2ff1ae Remove CompiledFunction + clean up JIT tests (#7421) 2018-05-16 20:03:04 +02:00
28b0b16f9b [auto] Update onnx to 01745b2 - Update README.md (onnx/onnx#976)
01745b28fa
2018-05-16 16:56:18 +00:00
c425d0350b Patches needed for sync, rebased (#7564) 2018-05-16 11:20:14 -04:00
7bc3414f8f fix caffe build failed with -O0 (#7570) 2018-05-16 11:19:15 -04:00
c5b9a36f1e Make return uniform in lbfgs step (#7586)
* Make return uniform in lbfgs step

This ensures that we are returning results of the same type
in LBFGS step.

* Adding test case to exercise different exit points

Sets the tolerance_grad to negative infinity and positive
infinity to deterministically excercise the early exit branch

* Fixing lint error
2018-05-16 11:16:46 -04:00
9213336c73 fix cmake USE_ASAN (#7608) 2018-05-16 11:10:13 -04:00
6eec4118a3 Fix python3.6 build in caffe2 CI (#7602)
* Fix python3.6 build in caffe2 CI

* Turn off onnx protobuf type stubs generation

* Revert "Turn off onnx protobuf type stubs generation"

This reverts commit 618b80911a316caa69f2d774fb12ae6b24b2a6d6.
2018-05-15 23:01:18 -07:00
ba44231cbc [auto] Update onnx to 3a14d83 - Improve LRN doc (onnx/onnx#965)
3a14d83974
2018-05-16 05:49:21 +00:00
86b1e230c7 [auto] Update onnx to 061af05 - Print protobuf type stubs warning to stderr (onnx/onnx#979)
061af05f45
2018-05-16 05:07:35 +00:00
ed458fd311 Fix environment variables in rocm docker images (#7598)
* Fix environment variables in rocm docker images

* Add to .bashrc as well
2018-05-15 21:51:02 -07:00
9213b3f739 [caffe2] Fix linking of Android unit tests (#7607)
Android unit tests failed to link due because libnnpack and libcpuinfo appeared in the linker command line before libcaffe2. This patch somehow fixes it.
2018-05-15 21:39:37 -07:00
0493d49afa [auto] Update onnx to 63234db - remove fc op. (onnx/onnx#977)
63234dbae6
2018-05-16 04:15:22 +00:00
c76da6494b Drop support for MAGMA v1 (#7582)
Fixes #7502.

Test Plan: build and test

Build output has this:
```
-- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - True
-- Compiling with MAGMA V2 support
-- MAGMA INCLUDE DIRECTORIES: /data/users/rzou/miniconda3/include
-- MAGMA LIBRARIES: /data/users/rzou/miniconda3/lib/libmagma.a
```
2018-05-15 23:57:16 -04:00
be145e4f5b [auto] Update onnx to 0524595 - Do not generate protobuf python type stubs if protobuf python package is not installed (onnx/onnx#974)
052459560d
2018-05-16 03:40:42 +00:00
56fa6ec66a [caffe2] Change iteritems in trt/transform.py to items for python3 compatibility (#7599) 2018-05-15 20:32:06 -07:00
c187a5d79e Resolve the performance issue on ConvFusion Op (#7584) 2018-05-15 20:31:29 -07:00
cd86d4c554 PyTorch AMD Build Scripts (#6625)
* PyTorch AMD Build Script.

* Python invocation for hipify

* Adding individual hip fles.

* Updating CWD

Use the actual path for the file instead of the current working directory, which depends on where the script is invoked.

* Updating folder path for amd_build

* Removing previous amd_build directory

* Updated setup.py to support WITH_ROCM

* Renaming the files for CuDNN BatchNorm & Conv since having two .cpp files with the same name results in a linking error in the HCC compiler used for ROCm/AMD.

* Removing old BatchNorm & Conv files since they've been renamed.

* Updating build path to handle ROCM

* Cleaned up the build path and created a FindHIP cmake file for setting up relevant hip paths.

* Seperated the individual patch files to make it easier to detect issues while building.

* Removed CMakeLists hip files and fixed directory structure

* Adding build pytorch amd script

* Merged setup patch into PyTorch setup.py & cleaned a few issues

* Added information on where to download the hipify-python script.

* Resolved linting issues inside of build_pytorch_amd.py

* Removing many unnecessary patch files. Removing unnecessary .hip files. Fixing up the build process.

* Refactored the PR for supporting HIP

* Minimizing the number of changes inside individual patches.

* Cleaned up patch files.

* Removed patch files.

* Updating patches

* Removing HIP change from file.

* Cleaned up patches

* Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary for now.

* Removing the other HIP file

* Removed patch file + merged ROCm into Aten/test

* Removed ATen tests patch file and updated disbale_features yaml to remove headers that don't exist on the HIP stack.

* Reduced the number of patches down to 14 after Edward's suggestions.

* Transferred deletion of certain functions from patch to yaml file.

* Set default Thrust path

* Fixed aten files so we now use the templated pow/abs instead of std:: directly.

* Removed error from aten/src/THCUNN/Abs.cu

* Updated the locations of the cmake build files. Moved THCTensorRandom from a hip to a patch file. Added executable/library commands that can successfully handle either CUDA or HIP.

* Removed hip extraction from the build script and removed the old hip file.

* Replaced MACRO with function in upper level cmake.

* Added empty ELSE() block to prevent the loading of a command without CUDA or HIP. Also added IF guards around torch_cuda_based_add_executable in Aten tests.

* Updated aten tests.

* Removed the hip include from the ATen header.

* Can't throw exceptions on C++ AMP, using abort

* Missing IF guards for cuda/hip executables in aten tests.

* Removed a series of patch files.

* Added template keyword to help out the HCC compiler.

* Rebased the specific files displayed in the PR

* Fixing typo.

* Change flag from "WITH_CUDA" to "NOT NO_CUDA"

Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase.

* Fix LoadHIP path

* Updating build files after rebasing.

* Reorganization after cpu/gpu separation.

* Removed HIPCC from setup.py & removed -shared extra linking args.

* Updated CMake / Setup build to correctly link when under ROCm stack.

* Removed the unnecessary argument from Extension constructor.

* Adding another test to be included with ROCm building.

* Updated the setup_helpers scripts in order to get around linter error

* Fix syntax issue

* Solving lint issue: line too long
2018-05-15 18:38:01 -07:00
2de1b4488f Run sccache in background mode and save logs to file (#7594)
Running sccache in foreground mode seems to uniformly slow down the builds and causes virtual memory exhausted errors for gcc7.2 builds. This PR moves sccache to background mode instead and print the compilation log at the end of the build.
2018-05-15 21:21:19 -04:00
4b6c884b99 [caffe2][nomnigraph] Add optimize function to opt:: namespace that takes in a level and optimizes the graph/workspace accordingly. Adding it to predictor and speed_benchmark arguments (#7558) 2018-05-15 15:57:06 -07:00
469c6c88a3 [auto] Update onnx to dc07e0f - Extend Concat/Gather/Squeeze/UnSqueeze to accept any tensor type (onnx/onnx#957)
dc07e0fb2f
2018-05-15 22:26:47 +00:00
9211790049 [caffe2] Include <array> in fatal_signal_asan_no_sig_test (#7592)
fatal_signal_asan_no_sig_test.cc uses std::array, but doesn't include the header. It caused build error on Android.
2018-05-15 15:02:24 -07:00
0df84d7ec7 [auto] Update onnx to 21b56ad - mypy info (onnx/onnx#970)
21b56ada78
2018-05-15 21:38:37 +00:00
79b9bbe60f [caffe2] Use caffe2::stod in lexer (#7591)
std::stod causes build errors on Android
2018-05-15 14:06:24 -07:00
be019e4429 [auto] Update onnx to 76a288f - add script to count shape inference implementations (onnx/onnx#967)
76a288f098
2018-05-15 20:54:11 +00:00
3af3d13599 Run onnx integration tests in caffe2 CI (#7565)
* Run onnx integration tests in caffe2 CI

* verbose log

* turn off onnx verbose installation log

* can not install ninja

* Do not use all cores to build pytorch

* install tests require

* pip install to user dir

* use determined path to improve (s)ccache hit

* Do not change path in test.sh

* Add the compile cache hit trick to conda install as well

* cover jenkins in CI environment detection
2018-05-15 13:25:24 -07:00
e65d6de16a [auto] Update onnx to 3f80231 - Add type hints to numpy_helper_test.py (onnx/onnx#951)
3f80231786
2018-05-15 20:07:18 +00:00
37f5b147fc [auto] Update onnx to 037cfaa - Add type hints to test_backend_test.py (onnx/onnx#954)
037cfaa015
2018-05-15 20:06:06 +00:00
996886137a Add link to TensorFlow Distributions paper (#7563) 2018-05-15 15:46:54 -04:00
5748cc43ce [auto] Update onnx to c918b4b - Add type hints to basic_test.py (onnx/onnx#947)
c918b4be91
2018-05-15 19:23:53 +00:00
d971782a03 Change code owners for onnx integration tests (#7587) 2018-05-15 15:22:32 -04:00
efb7dead9d Squelch -Werror=non-virtual-dtor (#7554)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-15 13:53:15 -04:00
4251e38eb3 [auto] Update onnx to b265987 - Add type hints to helper_test.py (onnx/onnx#950)
b26598714c
2018-05-15 03:30:02 +00:00
a52eb24c42 [auto] Update onnx to bb4d582 - Add type hints to relu_test.py (onnx/onnx#952)
bb4d5827cf
2018-05-15 03:24:41 +00:00
be7c5e573e [auto] Update onnx to 533a84c - Add type hints to elu_test.py (onnx/onnx#949)
533a84c3ca
2018-05-15 03:23:15 +00:00
f007392522 [auto] Update onnx to a659ab9 - Add type hints to schema_test.py (onnx/onnx#953)
a659ab90cc
2018-05-15 03:22:04 +00:00
dbf77ef7a7 [auto] Update onnx to 28a8849 - Add type hints to onnx/test/optimizer_test.py (onnx/onnx#955)
28a8849127
2018-05-15 03:20:57 +00:00
fb314ee150 [auto] Update onnx to 65f1811 - Fix a type error in lstm test case (onnx/onnx#959)
65f1811d2d
2018-05-15 02:43:40 +00:00
08415c42af Replace std::to_string with caffe2::to_string in nomnigraph (#7561)
std::to_string is not available on Android with GNU STL. We conventionally use caffe2::to_string as a portable alternative.
2018-05-14 19:37:43 -07:00
e1148db7f2 Implement logsumexp (fixes #2591) (#7254)
* Implement logsumexp (fixes #2591)

* Add logsumexp_backward, fix _out declaration.

Thank you Simon and Edward for your comments!
2018-05-14 22:08:14 -04:00
05853945a4 Vectorize softmax and logsoftmax (#7375)
This PR uses Vec256 to vectorize the softmax and logsoftmax Layers.

This comes in 4 steps:

log_softmax
softmax
log_softmax_backward
softmax_backward

* Vectorized Softmax and LogSoftmax

* Abstractions

* Style

* Remove <limits> for Kernel

* Perf investigations

* Last cleanups
2018-05-14 22:08:00 -04:00
44a10f2a98 Removing arch 20 + 21 (#7512)
Should solve the shfl_xor undefined problem on cuda8 with conda and aten
2018-05-14 22:06:52 -04:00
4d35a40f3b Better logging for sccache compilation failure (#7555) 2018-05-14 22:03:38 -04:00
3414475653 [C++ API] Remove initialize_* functions (#7517)
* Remove initialize_ functions

* Fix clone() to recursively clone children

* Small codemove
2018-05-14 18:24:58 -07:00
bf9676180f Update the name of env var for triggering integrated conda build (#7557) 2018-05-14 16:28:39 -07:00
1666b54068 [auto] Update onnx to ac970c9 - update onnx model tests for rnn/lstm/gru (onnx/onnx#960)
ac970c9dcb
2018-05-14 22:43:18 +00:00
284f13b814 make sure that pytorch and caffe2 usage lines up with onnx rnn spec (#7511) 2018-05-14 15:42:56 -07:00
ce69d3110b Improve script builtin checking using schema (#7311)
Improve script builtin checking using schema

* This add aten_schema.h which provides a barebones amount of type and
  argument information about each builtin operator
* emitBuiltinCall is updated to use this information rather than
  aten_dispatch to ensure the operator is correct.
* handling of keyword and position arguments now matches python behavior
* There is no longer a requirement that kwargs be constant or that the
  attributes of an op must be entirely constant or non-constant
* compiler now constructs a non-attributed version of the op first and
  then turns it into the constant-attribute version if all attributes
  are constants.
* default arguments for builtins now work
* SugaredValue::call and similar functions now have SourceRange information
  for their arguments so that error reporting is more accurate

Notes:
* This does not try to merge the builtin checking with python arg parser.
  Given that we will eventually have C10 schema which will replace aten_schema,
  we will eventually have a C++ description of the schema and working of that
  description directly will be the easiest form to understand.
* python function calls and script method calls do not support keyword arguments yet.
  When we add this support we should refactor the handling in tryEmitSchema
  that resolves keywords into a common function.

* default arguments work
* keyword arguments to builtins work (still need to extend to calling python and other script methods)
* much better error reporting for incorrect builtins

Lift any constants to attributes on nodes when possible

* Schema  is usable internally in the compiler as
  the function signatures of script functions as well as for builtin
  operators.
* Adds a List[T] class to better represent the arguments to cat/stack
  as a type rather than with custom checking.
* Support kwargs for calls of script methods

A future commit will be needed to add support for:
* calls to script _functions_ which are currently are GraphExecutors without schema info.
* kwargs to python functions, which will require refactoring python op
2018-05-14 14:46:36 -07:00
1f08000562 return value of LSTM example fixed. (#7534) 2018-05-14 15:36:09 -04:00
61afbbbd18 clamping the return value of uniform.cdf() to [0..1] (#7538)
* fix for #7532: clamping the return value of uniform.cdf() to the range [0,1]

* removed whitespace around equals to pass flake8 tests

* added a test for uniform.cdf() with arguments outside support
2018-05-14 15:36:00 -04:00
bccb727b65 Remove wrong "input" arg from scatter_() docstring (#7550) 2018-05-14 15:33:47 -04:00
a9a44faf03 [auto] Update onnx to 310b44c - Add tools for generating c++ code test coverage (onnx/onnx#938)
310b44c800
2018-05-14 19:13:47 +00:00
cf9913d569 Install torchvision before running integration tests (#7552) 2018-05-14 11:49:10 -07:00
4af63916cd Set up Caffe2 CUDA builds to use sccache (#7547)
* Set up Caffe2 CUDA builds to use sccache

* comment fix
2018-05-14 11:15:58 -07:00
56a63459b6 [auto] Update onnx to 330fd0f - shape inference for TopK and trigonometric functions (onnx/onnx#946)
330fd0f73e
2018-05-14 04:29:19 +00:00
169e91c530 [auto] Update onnx to 8ff5fdb - fix def of gru version 1 (onnx/onnx#945)
8ff5fdbe26
2018-05-14 03:48:22 +00:00
fc23885105 Fixes reductions where accum type != type and simplifies all reductions (#7487)
This PR makes two improvements:

It fixes reduce kernels where accum type != type. Currently, for example, half tensors with small values may have norms that are (approximately) representable in fp16, but calling .norm() on them will result in underflow and a reported norm of zero. This PR fixes that behavior and adds a test in test_cuda.py to ensure underflow does not occur (test_tiny_half_norm).

It simplifies all reductions by removing excessive templating and the -2 contiguous special case from THC_reduceDim and THC_reduceAll. The latter was previously removed from pointwise apply. This has no performance impact as the -2 special case was already mapping to the 1D code path.

PyTorch currently attempts to handle accum type != type by either (1) writing kernels that immediately convert values to accum type after reading or (2) writing operations that take in type values and accumulate to the accum type. The latter path was not working properly (hence the current excessive half tensor underflow) and resulted in a lot of redundant code, with two reduce ops being passed to a kernel instead of one, and reduce ops frequently receiving the same template argument twice.

This PR makes the former approach THE approach. Kernels that accumulate to (potentially) different types should follow the pattern of converting their input to the accum type, performing all operations on that type, and then converting back to the appropriate type if writing their value back to the tensor. This pattern makes the second reduce op redundant and allows for simpler templating, which should improve readability, reduce build time, and reduce binary size. Also, this prevents ops from having to perform their own conversions, which could result in poor performance if the same value was operated on multiple times.

One exception to this simplification was that a new ThrustTensorDistOp was created to handle a call to thrust::inner_product(). This Op fuses the conversion and the TensorDistOp.

In addition to the expected simplification, there is also some cleanup of excessive template parameters. For example, kernelReduceAllPass2() had three template parameters: T, IndexType, and ReduceOp, but IndexType was never used.

* wip

* Adds tests

* Fixes Python linting

* mean and norm fusions, code cleanup

* fixes file permissions
2018-05-13 18:33:48 -04:00
d0287eca94 [auto] Update onnx to c50f329 - Adding shape inferences for GlobalMaxPool, GlobalAveragePool, and GlobalLpPool" (onnx/onnx#943)
c50f329dcd
2018-05-13 20:36:27 +00:00
63ae163b24 put dropout states on the input device (#7515)
* put dropout states on the input device

* add assert to aten, add test, fix lint

* only assert device if states are defined
2018-05-13 16:25:37 -04:00
1ce5431aaf Documentation improvements (#7537)
- improve scatter documentation (fixes #7518)
- refine KLDivLoss documentation (fixes #7464)
- fix some sphinxbuild warnings

Thank you, Hugh Perkins for reporting!
2018-05-13 15:44:24 -04:00
8f64f918f7 [auto] Update onnx to 0a6076e - Fix the opset version in backend tests (onnx/onnx#944)
0a6076eae6
2018-05-13 15:46:52 +00:00
c84fdda582 Skip onnx backend tests for inverse trigonometric ops (#7533) 2018-05-13 08:41:28 -07:00
a3b2877810 Fix CUDA builds. (#7529)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-13 09:54:03 -04:00
825c3ca2d6 [auto] Update onnx to 4e98b03 - add trigonometric functions (onnx/onnx#869)
4e98b038d1
2018-05-13 07:52:49 +00:00
f529b85035 [auto] Update onnx to 0bd3f78 - Add shape inference for LpPool, RoiPool, and fix MaxPool, AveragePool, and Conv (onnx/onnx#928)
0bd3f78bf4
2018-05-13 05:05:49 +00:00
5336ea4195 Work around Python nightly regression. (#7526)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-13 00:17:36 -04:00
2bb38ba700 Built-in support for rebuilding in win-build.sh (#7442)
* Built-in support for rebuilding in win-build.sh

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* fixups

Signed-off-by: Jenkins <jenkins@ci.pytorch.org>

* CR comments

* CR comments

* more delayed expansion fixes
2018-05-12 23:53:40 -04:00
ac52f1186a [minor] change dockerfile to point to pytorch channel (#6960) 2018-05-12 23:43:09 -04:00
37b9d093d2 Updates collapseDims() function and documentation (#7056)
* Updates collapseDims() function and documentation

* Adds C++ tests, validates input, updates names for readability

* Removes invalid test

* stashing to merge AT_CHECK macro

* Updates asserts, removes tests on Windows
2018-05-12 23:42:55 -04:00
cfc1d92975 Implement ellipses ('...') and diagonals (e.g. 'ii->i') in einsum. (#7173)
This brings the two most important missing numpy einsum features
to toch.einsum.
2018-05-12 23:39:37 -04:00
7edd451a4e Improve spectral_norm (fixes #7261) (#7298)
* Improve spectral_norm (fixes #7261)

Thank you Morgan Funtowicz for the report and minimal example!

* compute sigma only once
2018-05-12 23:31:37 -04:00
cf9751207e Allow building Caffe2 with ATen support (Addresses #7249) (#7297)
* Addresses Issue #7249, where Caffe2 cannot be built with ATen support

* Fixed indentation
2018-05-12 23:30:46 -04:00
eaa3f2e613 Fix advanced indexing with negative indices (#7345)
* Fix advanced indexing with negative indices

Fixes #7156

Here is some behavior before this PR:
```
In[1]:
x = torch.arange(9).view(3, 3).contiguous()
x[[0], [-1]]  # Should be equivalent to x[0, -1]

Out[1]:
tensor([ 8])
```

The bug is that negative indices are added to the computed linear index
directly. In the above example, the linear index computed is "-1", which
wraps around to "8", giving the last element of a flattened view of `x`.

Instead, we should wrap negative indices around before adding them to
the linear index.

* Use toCLong()
2018-05-12 23:24:40 -04:00
2ac34b98ea [auto] Update onnx to 490c4c6 - fix build dependency between onnx-operators.proto and (onnx/onnx#934)
490c4c6ca9
2018-05-13 03:14:44 +00:00
976b1d5ec1 Don't initialize the current device in CUDAGenerator::CUDAGenerator (#7392)
Previously, CUDAGenerator::CUDAGenerator would initialize the random
number generator on the current device. This would usually be device 0.
This is undesirable because initialize the CUDA context allocates a few
100 MBs due to all the kernels in libTHC.so.

This avoids the unecessary call to THCRandom_getGenerator() in the
CUDAGenerator constructor.

Fixes #7320

Previously, CUDAGenerator::CUDAGenerator would initialize the random
number generator on the current device. This would usually be device 0.
This is undesirable because initialize the CUDA context allocates a few
100 MBs due to all the kernels in libTHC.so.

This avoids the unecessary call to THCRandom_getGenerator() in the
CUDAGenerator constructor.

Fixes #7320

* Fix call to get THCState
2018-05-12 22:57:06 -04:00
acb6f2697e Some notes about developing on Windows (#7447)
* Some notes about developing on Windows

* typofix
2018-05-12 22:55:11 -04:00
03767b66db Add FileNotFoundError to torch._six (#7524)
Add FileNotFoundError for compatibility with Python 2 and use in
dataloader. Fixes pytorch/pytorch#6932
2018-05-12 20:54:26 -04:00
921dece2d7 Update Im2ColNd functions (#7505)
Update Im2ColNd functions
2018-05-12 15:59:50 -07:00
db6e4576da Use customized python interpreter (#7520) 2018-05-12 13:06:39 -04:00
0337d6708c Use SLEEF's tanh (#7513) 2018-05-12 14:14:02 +00:00
ed3b12e1ba [Caffe2] Ideep net optmization passes (#7514)
* Transform ideep net

* Add conv+relu transformation

* Add verification and address comments
2018-05-11 23:50:18 -07:00
580556dd60 [auto] Update onnx to 25b8845 - Extend AveragePool to support average count include padding (onnx/onnx#884)
25b8845a14
2018-05-12 04:10:55 +00:00
6ada041b31 Some small fixes in C++ API (#7510) 2018-05-11 18:56:53 -07:00
aced37a633 [auto] Update onnx to 7c8b3d2 - [Typing 4/5] Add type hints to onnx/backend (onnx/onnx#913)
7c8b3d2c75
2018-05-11 23:19:04 +00:00
141d81d095 Move ONNX integration tests from onnx-fb-universe to PyTorch repo (#7397)
* Move ONNX integration tests from onnx-fb-universe to PyTorch repo

* Switch to use torchvision

* Delete single rnn operator tests, they have been covered in e2e tests in test_caffe2.py

* Mirror the fix in onnx-fb-universe to bypass cuda check

667326d84b
2018-05-11 15:05:18 -07:00
b3f0ab3726 rnn onnx export: consolidate rnn/gru/lstm (#7506) 2018-05-11 14:58:20 -07:00
2863d935b9 [Caffe2] Fix of the performance issue of IDEEP (#7503)
* Sketch fix of the performance issue of IDEEP

* Revert CMakefile

* Fix tests

* format

* comments

* Print error

* review comments
2018-05-11 13:43:41 -07:00
38bc732b2d [jit] Change interpreter/fuser to work on Variables only (#7489)
* this removes the flag controlling whether the interpreter works on variables.
* now the interpreter _always_ works on variables
* constants in the IR are still _always_ non-variables, and an assert was added to ensure this.
* as_tensor was split into as_variable and as_tensor since it is sometimes used
  to construct constants in the IR
* I tried changing the IR to also always use variables but that change was much more
  cross cutting and fragile and I never got it working
2018-05-11 13:33:47 -07:00
dc0faab18d Add zeros_ and ones_ init + tests (#7488)
* Add zeros_ and ones_ init + tests

* Dedup tests

* Remove all occurences of as_variable
2018-05-11 11:07:11 -04:00
5f96a2d26a Add sparse gradient option to pretrained embedding (#7492)
* Add sparse gradient option to pretrained embedding

* Add sparse gradient option to pretrained embedding

* Trailing white space
2018-05-11 08:44:53 -04:00
857e3f4a5e Throw error in tensor constructor when numpy strides mismatch (#7440) 2018-05-11 11:00:43 +02:00
b875fb281c Update from facebook (#7451)
* [bootcamp] Improve "Shape" operator to support axes specification

To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.

* Back out "Add barrier net that runs before training nets"

Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.

* Change warning to verbose log to reduce log spam

The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.

* Extract the shared code from different caffe2_benchmark binaries

The OSS benchmark and Internal benchmark will share most functions in the benchmark.

* Support MFR in sequence training

As titled.

* Make knowledge distillation work with using logged prediction feature as teacher label.

1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label

* [C2/CUDA]: unjoined cross entropy sigmoid

as desc

* Add async_scheduling executor into deferrable_net_exec_test

Add async_scheduling into tests and fix some exception cases

* Fix Event disabled error

When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync

* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA

as desc.

* [C2 Core] Infer input device option in C2 hypothesis_test checkers

Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.

For example:
CopyCPUToGPU

* [C2 Op]SplitByLengthsOp CPU/GPU implementation

[C2 Op]SplitByLengthsOp CPU/GPU implementation

* fix undefined symbol error

not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now

* Add tools in DAIPlayground platform to help debugging models

Add additional tools to allow Plauground override individual method defined in AnyExp.  This will allow user to create module that specificly change certain default method behavior.  An example included in this diff is deactivating test model and checkpointing.  When debugging any model problems, switching off components helps me quickly narrow down the location of the bug.  The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)

* add shape and type inference for int8 conversion operator

* Fix flaky test for group_norm

Fix flaky test for group_norm

* Fix group_norm_op_test flaky

Fix group_norm_op_test flaky

* Implementation of composite learning rate policy

In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.

For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.

* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader

# Use Cases:

1). input: DB file -> output: DatasetReader.

Use DBFileReader.

2). input: Reader -> build cache DB file -> output: DatasetReader.

Use CachedReader.

# Changes to CachedReader:

1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.

# Changes to tests:

1). Make a separate TestCase class for CachedReader and DBFileReader.

2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.

3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.

* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"

Original commit changeset: 4489c6133f11

* Fix LARS bug

Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.

* [tum] support sparse init & add uniformFill option

as title

* Propagate exception for async nets

Capture the exception when an exception is thrown in async nets and re-throw it after wait().  This allows exceptions to be propagated up to the caller.

This diff was a part of D7752068.  We split the diff so that C2 core files changes are in a separate diff.

* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc

Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a

Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>

* [C2]ReluN Op

relu n op.

tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6

* Call destructor when assigning a blob value

* Add executor overrides

Add executor overrides flag to enable migration to async_scheduling executor

* Add barrier net that runs before training nets - attempt #2

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.

This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.

To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net.  Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.

* Handle empty nets in async_scheduling

Make sure we don't get stuck on empty nets

* use CUDA_ARCH for conditional compile

* [C2 fix] infer function for ensure_cpu_output_op

* Update group_norm test to reduce flaky test

* Fix lr_multiplier for GPU
2018-05-10 23:14:27 -07:00
947155c69d [auto] Update onnx to b2539fc - Shape and type inference for Flatten, SpaceToDepth, DepthToSpace (onnx/onnx#930)
b2539fca83
2018-05-11 02:43:57 +00:00
f8b5d420a4 Fix Caffe2 build with ATen CPU/GPU split (#7486) 2018-05-10 19:28:56 -07:00
75f549bbef [auto] Update onnx to 9dd2533 - Changes done internally at Facebook (onnx/onnx#909)
9dd2533ee3
2018-05-10 23:34:10 +00:00
d5e77fb058 Port interface of store base class from Caffe2 (#7439)
The file store implementation is new and based on the file
initialization method (which uses a single file and file locking) and
the interface of the Caffe2 store handler.

See #7434.
2018-05-10 16:04:19 -07:00
6547245f1f Add return value to setup() function of PipedReaderBuilder (#7476) 2018-05-10 15:39:54 -07:00
6c7a8318c4 Fix Tensor.type(dtype) not preserving device (#7474)
Note that Tensor.cuda() will stil copy the tensor to the current device
if it's a CUDA tensor on a different device.

Fixes #7441
2018-05-10 18:22:13 -04:00
43264c3c30 add cast to ensure correct type for sequence lens argument (#7483) 2018-05-10 14:58:00 -07:00
c489c6a1da Skip upsample onnx backend test (#7477) 2018-05-10 13:17:24 -07:00
a2a4b229cc [caffe2][nomnigraph] Make conv relu fusion more generic (#7437) 2018-05-10 13:03:20 -07:00
9fa1dff66a Allow the use of torch.device for loading (#7339)
* Allow using torch.device for loading

* Make recommended changes

* Better tests
2018-05-10 15:50:00 -04:00
b6adf6871c EmbeddingBag to handle empty bags in all modes (#7389) 2018-05-10 15:46:57 -04:00
3f029224cd hotfix: update cmake version for Linux CUDA9 builds (#7478) 2018-05-10 15:39:57 -04:00
9789602814 Fix excess ']' in nn.utils.rnn.pack_sequence (#7475) 2018-05-10 14:41:17 -04:00
93eb50c103 Mark expand nodes as implicit/explicit in trace (#7303)
When tracing we record expand nodes. This is useful in some cases because
it makes it clear a broadcast happened. However, in future runs
the broadcast may be different or not needed. This change adds an
attribute to expand to track if it was implicitly added. This
takes the form of an unused input to expand with a default value.

The execution engine then removes implicit expands before execution.
Note that shape_analysis will re-add expands when it can prove by
shape analysis that they will exist and this is useful for the fuser,
so this change should not affect fusion passes.
2018-05-10 10:47:43 -07:00
c3918da523 [auto] Update onnx to 008a805 - update some model files (onnx/onnx#926)
008a8054fd
2018-05-10 17:45:10 +00:00
20041e2704 better cache for nccl resourse (#6970)
allow more than 1 device list to be stored
2018-05-10 19:42:36 +02:00
64834f6fb8 Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275)
* Split libATen.so into libATen_cpu.so and libATen_cuda.so

Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds.  This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time.  And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).

This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support.  This brings ATen's dynamic library
structure more similar to Caffe2's.  libATen.so is no more
(this is BC BREAKING)

The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry.  This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits.  Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class.  We similarly use
the hooks interface to handle Variable registration.

We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL.  If you access the
registry directly you must maintain this invariant.

There are a few potholes along the way.  I document them here:

- Previously, PyTorch maintained a separate registry for variable
  types, because no provision for them was made in the Context's
  type_registry.  Now that we have the hooks mechanism, we can easily
  have PyTorch register variables in the main registry.  The code
  has been refactored accordingly.

- There is a subtle ordering issue between Variable and CUDA.
  We permit libATen_cuda.so and PyTorch to be loaded in either
  order (in practice, CUDA is always loaded "after" PyTorch, because
  it is lazily initialized.)  This means that, when CUDA types are
  loaded, we must subsequently also initialize their Variable equivalents.
  Appropriate hooks were added to VariableHooks to make this possible;
  similarly, getVariableHooks() is not referentially transparent, and
  will change behavior after Variables are loaded.  (This is different
  to CUDAHooks, which is "burned in" after you try to initialize CUDA.)

- The cmake is adjusted to separate dependencies into either CPU
  or CUDA dependencies.  The generator scripts are adjusted to either
  generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).

- I changed all native functions which were CUDA-only (the cudnn functions)
  to have dispatches for CUDA only (making it permissible to not specify
  all dispatch options.)  This uncovered a bug in how we were handling
  native functions which dispatch on a Type argument; I introduced a new
  self_ty keyword to handle this case.  I'm not 100% happy about it
  but it fixed my problem.

  This also exposed the fact that set_history incompletely handles
  heterogenous return tuples combining Tensor and TensorList.  I
  swapped this codegen to use flatten() (at the possible cost of
  a slight perf regression, since we're allocating another vector now
  in this code path).

- thc_state is no longer a public member of Context; use getTHCState() instead

- This PR comes with Registry from Caffe2, for handling static initialization.
  I needed to make a bunch of fixes to Registry to make it more portable

  - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
    least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
    struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
    token pasting because it does not work with MSVC.

  - It seems MSVC is not willing to generate code for constructors of template
    classes at use sites which cross DLL boundaries. So we explicitly instantiate
    the class to get around the problem. This involved tweaks to the boilerplate
    generating macros, and also required us to shuffle around namespaces a bit,
    because you can't specialize a template unless you are in the same namespace as
    the template.
  - Insertion of AT_API to appropriate places where the registry must be exported

- We have a general problem which is that on recent Ubuntu distributions,
  --as-needed is enabled for shared libraries, which is (cc @apaszke who was
  worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
  this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
  make CI work, but a more sustainable solution is to attempt to dlopen
  libATen_cuda.so when CUDA functionality is requested.

    - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
      we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so

- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353

- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
  a few more things to CUDAHooks (getNumGPUs)

- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
  only does something different for CUDAGenerator)

- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)

- CUDAHooks/VariableHooks structs live in at namespace because Registry's
  namespace support is not good enough to handle it otherwise (see Registry
  changes above)

- There's some modest moving around of native functions in ReduceOps and
  UnaryOps to get the CUDA-only function implementations into separate files, so
  they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
  function due to object linkage boundaries.

- Some direct uses of native functions in CUDA code has to go away, since these
  functions are not exported, so you have to go through the dispatcher
  (at::native::empty_like to at::empty_like)

- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
  (which matters now that TH and THC are not in the same library)

- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
  both TH_API and THC_API

- TensorUtils.h is now properly exported with AT_API

- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
  ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently

- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
  declare a type as possibly undefined when we should have. We didn't catch this
  previously because optional annotations are not tested on "pass-through" native
  ATen ops (which don't have dispatch). Upstream issue at #7316

- There's a new cmake macro aten_compile_options for applying all of our
  per-target compile time options. We use this on the cpu and cuda libraries.

- test/test_cpp_extensions.py can be run directly by invoking in Python,
  assuming you've setup your PYTHONPATH setup correctly

- type_from_string does some new funny business to only query for all valid CUDA
  types (which causes CUDA initialization) when we see "torch.cuda." in the
  requested string

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Last mile libtorch fixes

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* pedantic fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 10:28:33 -07:00
ea98256e96 Buf check_unique fix for jit (#7468) 2018-05-10 19:27:24 +02:00
b5a1eda7d3 guard dynamic sizes expand from peephole passes (#7436) 2018-05-10 09:34:20 -07:00
6a118b21b5 Set MAX_JOBS to nproc - 1 if using sccache to compile CUDA (#7361)
* Set MAX_JOBS to nproc - 1 if using sccache to compile CUDA

* Change JOBS setting in tools/cpp_build/build_common.sh
2018-05-10 12:25:13 -04:00
78c3d8c164 Adding yaml to docker images for Aten builds (#7430)
* Adding yaml to docker images for Aten builds

* Removing pip install of yaml due to permissions
2018-05-10 09:07:21 -07:00
c5de3314cf Add name() to C++ modules (#7409)
* Add name() to C++ modules

* Use RTTI to get module name by default

* Add functional.cpp to CMakeLists.txt

* Call typeid() inside name() instead of constructor

* Add tests and use default constructor
2018-05-10 08:52:38 -07:00
ab5c391100 onnx rnn export: use spec-respecting dimensions (#7394)
fixes https://github.com/pytorch/pytorch/issues/6879
2018-05-10 08:19:17 -07:00
d9671ea38e Fix Caffe2 with ATen build (#7452) 2018-05-10 07:57:31 -07:00
a257bd19a2 added state_dict/load_state_dict for ReduceLROnPlateau (#7201) 2018-05-10 12:02:28 +02:00
4eaf5261d3 Provide default implementation of clone() in base module (#7446) 2018-05-10 00:49:29 -07:00
48b7f298f9 Update NNPACK and cpuinfo submodules to latest master (#7443)
In Maratyszcza/NNPACK#140 @daquexian reported an error on Faster-RCNN model with MobileNet V2, when running with NNPACK engine. The error disappears when using the latest NNPACK and cpuinfo. Updating submodules upstream to ensure others don't hit this issue.
2018-05-10 00:20:19 -04:00
bd8f6bd46a hotfix: update cmake version for OSX builds (#7456) 2018-05-10 00:05:04 -04:00
3023dd25f3 Use set_type to implement type conversions in C++ API (#7408)
* Use set_type to implement .cuda() in C++ API

* Change C++ module parameter types in place

* Fix bug where batchnorm state was not moved to CUDA
2018-05-09 17:01:19 -04:00
ed111619da [ONNX] Allow specifying only a subset of input/output names (#7427)
* [ONNX] Allow specifying only a subset of input/output names

Then we can only specify the "real" names while ignoring the names for all the parameters

* fix

* Update utils.py
2018-05-09 13:02:20 -07:00
d9c74f727c Fix ONNX tutorial specification for input names (#7433)
* Fix ONNX tutorial specification for input names

* Some more updates
2018-05-09 13:01:53 -07:00
56077f5661 Fix CODEOWNERS precedence for ONNX folder (#7429)
More specific paths should come later, since the last matching pattern takes precedence
2018-05-09 14:31:10 -04:00
23be4ac3a2 Add clang tidy tooling (#7412) 2018-05-09 13:08:53 -04:00
769397eb77 [Caffe2] [feature request] Add gradient operators for IDEEP (#7234)
* Add gradient operators for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add gradient test cases for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Upgrade third_party/ideep

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Refine SumOp for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share input buffer in fallback op if possible

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback ConvTranspose op for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix bug introduced by the patch of sharing input buffer

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share output buffer in fallback operators

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove IDEEP to resolve repo issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Reflash IDEEP repo

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove redundant lines in IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback operators for IDEEP
(Flatten, ResizeLike, Transpose, and Reshape)

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-05-09 08:52:24 -07:00
97c5c0b034 add python library linking on Windows (#7157) 2018-05-09 11:50:55 -04:00
f02ae65727 skip test_utils.TestFFI.test_cpu for ppc64le due to incompatible exception handling (#7422) 2018-05-09 11:45:30 -04:00
f43e067128 Make optimizer not complain about parameters with requires_grad=False (#7419) 2018-05-09 11:34:52 -04:00
6fd252ccae AUTOGRAD_ to TORCH_AUTOGRAD_ for macros (#7424) 2018-05-09 10:45:05 -04:00
537cb10525 improve DataParallel/DistributedDataParallel docs (#7407) 2018-05-09 10:30:42 +02:00
ef477b2b00 [auto] Update onnx to 72e15ac - [Typing 2/5] Add type hints to onnx/defs (onnx/onnx#911)
72e15ac46f
2018-05-09 06:49:45 +00:00
dca540e455 [auto] Update onnx to ee7f97c - Add type hints to onnx/tools (onnx/onnx#910)
ee7f97c2b1
2018-05-09 06:48:38 +00:00
af23ab9b3e Make omnigraph a public dependency of caffe2 main lib (#7402) 2018-05-08 23:37:40 -07:00
5c2015d133 onnx werror is now opt in (#7390) 2018-05-08 21:21:34 -07:00
8dbeffab07 Add back SLEEF and also use better cmake setup. (#7341) 2018-05-09 02:48:16 +00:00
7911a30081 Move #endif below magma source (#7400) 2018-05-08 22:28:26 -04:00
92d02a46dd Dont do CSE on nodes with blocks (#7363) 2018-05-08 18:00:45 -07:00
b1fbf29b52 [caffe2][nomnigraph] Change the standard transform API to take in NNModule rather than NetDef (#7308) 2018-05-08 17:43:51 -07:00
dc3252730e Fixing conda builds by removing unneeded python args (#7384) 2018-05-08 17:33:30 -07:00
3913e9ead3 [caffe2][nomnigraph] Batchnorm + Conv Fusion (#7057) 2018-05-08 15:40:34 -07:00
3185d8342e Replace incorrect usages of "NotImplemented" (#7381)
* Replace incorrect usages of "NotImplemented"

Fixes #7266. Replaces "NotImplemented" (which is supposed to be used for
binary ops) with the correct "NotImplementedError".

* Address comments
2018-05-08 18:31:45 -04:00
755d3105b6 Fix MultiMarginLoss equation in docs (#7383)
Fixes #7237
2018-05-08 18:30:47 -04:00
e3935f7509 [Caffe2] Add conv+relu fusion for MKLDNN ops (IDEEP) (#7385)
* Add conv+relu fusion for MKLDNN ops (IDEEP)

* comments
2018-05-08 14:44:53 -07:00
8c8918c341 make half overflow checks consistent with other types (#7382) 2018-05-08 14:40:18 -07:00
8f27582194 [auto] Update onnx to dee6d89 - make werror opt-in (onnx/onnx#908)
dee6d89781
2018-05-08 21:22:40 +00:00
71626491c4 Add batched linear solver to torch.gesv() (#6100)
* Add batched linear solver to torch.gesv()

Fixes #3164
Picks up from #4502

I moved `gesv` to ATen.
Adds bindings for MAGMA's `gesv_batched` function for CUDA.
For CPU, runs `THLapack(gesv)` in a for loop.

The new function supports arbitrary batch dimensions (and broadcasting
of those dimensions). For example, the 4-d tensor `A x B x M x M` should
be treated as having batch-size `(A x B)`.

The overhead of creating the magma_queue_t is: ~350000 microseconds
the first time it's called and ~6 microseconds every time after that.

* Tests and docs

* Address comments

* Address comments

* Rebase

* Address comments

* Fix rebase

* Addressed comments

* Address comments

* Address comments

* Addressed comments
2018-05-08 17:06:27 -04:00
f598ef9102 Add CI docker image for rocm builds (#7349) 2018-05-08 13:41:27 -07:00
7b66c433bc Use a CI specific onnx namespace to catch hardcoded ones in the code (#7369) 2018-05-08 13:40:55 -07:00
de470d1222 Small fix needed to build Caffe2 Aten without CUDA (#7387) 2018-05-08 15:55:03 -04:00
fea95de854 Add aten::expand to the isDifferentiable list (#7350)
This lets aten::expand be differentiable in torchscript. It was probably
omitted from the list by accident in the past b/c gradientForNode does
already support aten::expand.

Also adds a test to check expand and its gradient in a torchscript fn.
2018-05-08 21:40:36 +02:00
913e145340 Removes -2 special case and specialization from pointwise apply (#7366)
* Removes -2 special case and specialization

* Specialization and comment cleanup
2018-05-08 14:58:46 -04:00
4adba42a75 [easy] minor cleanup in caffe2 jenkins test script (#7378) 2018-05-08 11:50:48 -07:00
9396740406 Updating condas to build for all CUDA archs (#7379) 2018-05-08 11:45:45 -07:00
67e7c24479 Add note about thread-safety of registry (#7285) 2018-05-08 10:26:28 -07:00
24b41da795 [build] Make ATen buildable without all Caffe2 by root cmake (#7295)
* Make ATen buildable without all Caffe2 by root cmake

* Fix typo in aten cmake

* Set BUILD_ATEN from USE_ATEN as compat

* Only set BUILD_ATEN from USE_ATEN when on

* Have USE_GLOO only set when BUILD_CAFFE2
2018-05-08 10:24:04 -07:00
0aebddd476 [auto] Update onnx to 522c055 - version bump to 7 (onnx/onnx#876)
522c05566e
2018-05-08 17:10:40 +00:00
e9f6f14555 [Caffe2] Revamp the convnet benchmark code by using models from model zoo (#7351)
* Revamp the convnet benchmark code by using models from model zoo

* Move ModelDownloader to caffe2/python/models

* Remove convnet_benchmarks.py
2018-05-08 08:53:52 -07:00
2cb26bcd40 Fix type in TensortRT tests (#7357) 2018-05-08 07:52:04 -07:00
75dbf9b113 [caffe2][build] Update python cmake flag print script (#7306) 2018-05-08 00:34:42 -07:00
79a4d27232 Correct the parameter annotation (#7367)
Make the annotation keep pace with  the parameter.
2018-05-08 00:31:16 -07:00
f439ba5843 [Caffe2][nomnigraph] Generic fuse conv relu pass for nomnigraph (#7355)
* Generic fuse conv relu pass for nomnigraph

* Use it in NNPACK conversion

* Comments

* Change the postprocess interface to take node instead of conv op
2018-05-07 23:19:06 -07:00
f3c8bd598d [Caffe2] Pinning conda-numpy to 1.14 to avoid SVD issue (#7344)
* Pinning conda-numpy to 1.14 to avoid SVD issue

* Adding another leveldb test to conda's ignored tests, removing a mkl-test from this

* Removing commented out section
2018-05-07 22:55:50 -07:00
75651c199f fix build (#7348) 2018-05-07 20:43:08 -07:00
b6adecdeee correct schema.Scalar's shape for a shape argument of 1 (#6493)
The schema.Scalar class makes pretty strict assumptions (via its docstring)
on the spec of the shape of its underlying object. Because of idiosyncracies
of numpy indexing and the use of np.dtype, those assumptions are broken on an
 edge case (dtype = (scalar_type, 1)). This corrects the behavior of this
edge case to conform to the spec.
2018-05-07 18:58:11 -07:00
e7116d95e0 Create README.md (#7360) 2018-05-07 18:26:59 -07:00
ea24c7ff1b Remove cdft library requirement from MKL (#7246) 2018-05-07 15:31:30 -07:00
ed6f79ccd2 [caffe2][build] Add ASAN to the debug release of caffe2 (#7107) 2018-05-07 15:26:51 -07:00
edbfe02941 [auto] Update onnx to ea0e0cb - remove whitespace and semicolon (onnx/onnx#904)
ea0e0cb13f
2018-05-07 22:07:27 +00:00
3642745ef9 [caffe2][nomnigraph] Add maxpool sink transform (#7207) 2018-05-07 14:52:10 -07:00
8fce8673bb Rename Container to Module in autogradpp and reorg code (#7304)
* Rename autograd namespace to torch and change torch.h into python.h

* Pave the way for torch::nn::Module

* Reorganize module code structure

* Undo ONNX update

* Remove sleef submodule
2018-05-07 14:45:00 -07:00
5146bc99e4 [auto] Update onnx to 328ed3e - shape inference for logical ops (onnx/onnx#899)
328ed3e679
2018-05-07 18:45:53 +00:00
2fdc00e41c Use sccache for Windows build (#7331) 2018-05-07 14:42:59 -04:00
f1e38725bf add to method for PackedSequence (#7319)
* ENH: add to method for PackedSequence

* ENH: return self if possible

* TST: remove extra data

* DOC: add more explanation

* TST: remove extra data

* DOC: minor fix
2018-05-07 14:39:03 -04:00
c68ae308cd [auto] Update onnx to d05b6b4 - Just don't output opset_version in the example then. (onnx/onnx#887)
d05b6b46f8
2018-05-07 18:04:01 +00:00
4f48b7c1ba [auto] Update onnx to 5be6d86 - fix typos in documentation (onnx/onnx#896)
5be6d86654
2018-05-07 17:44:15 +00:00
bebccc0c6d Improve math formula rendering in Poisson Distribution docs. (#7340) 2018-05-07 18:40:01 +02:00
4c511075c3 [auto] Update onnx to 6fa9f1a - promote identity op given it's being used. (#892)
6fa9f1a58b
2018-05-06 21:07:56 +00:00
f9b83f2e6c [auto] Update onnx to c0fb725 - Spec clarity: IR.md modifications. (#720)
c0fb725b64
2018-05-06 19:56:05 +00:00
56daed0a85 copy paste documentation error fixed in Softmin (#7324) 2018-05-06 21:50:46 +02:00
54a4867675 Bring back C++ extension torch.h (#7310)
* Bring back C++ extension torch.h

* Fix python.h include in python_tensor.cpp
2018-05-05 14:06:27 -07:00
6087a5feaa [auto] Update onnx to b0ab0d1 - function registration c++ API (#848)
b0ab0d1d15
2018-05-05 14:37:10 +00:00
94b74d2068 [auto] Update onnx to ceb259c - Tests for ReduceLogSum (#862)
ceb259c903
2018-05-05 08:36:40 +00:00
0859f0e3e6 Pinning numpy version in conda builds (#7314) 2018-05-04 16:38:53 -07:00
1f14d681dd [auto] Update onnx to 1c600f8 - Lint the code and fix the CI (#895)
1c600f802d
2018-05-04 22:50:30 +00:00
ea12702e02 [auto] Update onnx to 278ef5b - inference for math ops (#893)
278ef5bc9c
2018-05-04 21:51:16 +00:00
56ed857f1b [auto] Update onnx to f708d41 - type and shape inference for experimental ops (#890)
f708d41fea
2018-05-04 21:50:10 +00:00
f06fcc6efa Fix bug that introduced in pull #3280 (#7292)
Apparently get() is a function of requests, not a module (not sure if in
the past get() used to be a module). Therefore, the syntax in #3280 will
alway fail with ImportError, and requests lib will never be used (kind
of defeat the purpose of that pull request).
Also, if requests lib is used, should add stream=True parameter,
otherwise requests.get() will load the whole response into memory.
2018-05-04 14:14:02 -07:00
e1c7e6dce2 [auto] Update onnx to 38eea57 - add ONNX_NO_WERROR as option (#891)
38eea57313
2018-05-04 21:04:54 +00:00
67a9948d87 Refactor rnn export (#7263)
* rnn refactor: extract rnn weights and biases

* rnn refactor: make rnn with converted outputs

* rnn refactor: finish it off
2018-05-04 14:00:09 -07:00
55b8317f1d Update gif with new logo (#7301)
* Update gif with new logo

* add requires_grad=True
2018-05-04 16:47:08 -04:00
24681a8e49 Update unstable docs logo to new logo. (#7305)
Fixes #7302
2018-05-04 16:44:58 -04:00
feb64b5291 Add -Wno-unknown-pragmas (#7291) 2018-05-04 13:44:13 -07:00
3369828bfa Clarify patience in ReduceLROnPlateau docs (#7242)
* Clarify patience in ReduceLROnPlateau docs

It's unclear which definition of patience we have. The two ways to
interpret it are:
- How many bad epochs can you see before you start considering changing the learning rate.
- How many bad epochs can you see before you change the learning rate.

This PR clarifies the docs with an example. If `patience = 2`, then
after 2 bad epochs, we begin considering changing the learning rate.
After seeing one more epoch (the 3rd epoch), if that epoch is also bad,
then we change the learning rate after it.

* address comments
2018-05-04 16:39:26 -04:00
ac5d7bdf62 Fix onnx.symbolic.upsample_bilinear2d not considering align_corners (#7264) 2018-05-04 16:38:38 -04:00
0dd2521d4c Fix ONNX export for AveragePool with count_include_pad=True (#7279) 2018-05-04 13:21:32 -07:00
0259d9c8d3 Changing underscores to hypens in conda package names (#7299) 2018-05-04 12:50:41 -07:00
a0c1e5faea Change the error message in pad_sequence to be more user-friendly (#7283) 2018-05-04 12:29:21 -07:00
36a3f0995b Remove THDTensorDescriptor_newFromTH{X}Tensor. (#7287)
They don't seem to be used and we are moving to a single TensorImpl model.
2018-05-04 12:22:19 -07:00
833b1e6c74 Skip the test case on ReduceLogSum (#7293) 2018-05-04 11:49:30 -07:00
026cb9d2f1 set ONNX_NO_WERROR (#7296) 2018-05-04 11:35:15 -07:00
a015d579dd move softmax/logsoftmax to ATen (#6786)
* move softmax/logsoftmax to ATen

* specify cpu and gpu accum types

* use accreal for CPU

* expose softmax backward to python, fix legacy interface

* fix Distributions.cu to use common AccumulateType

* fix cuda 8 build

* delete commented out lines

* rebase on master, fix breakages
2018-05-04 14:23:35 -04:00
5c575a1497 Fixes RNN shapes for C++ API (#7272) 2018-05-04 14:00:30 -04:00
9e3f5bb5fd enable onnx shape inference when converting onnx -> caffe2 (#7260) 2018-05-04 10:27:30 -07:00
157d7499e7 Disable two flaky C++ API tests. (#7290)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-04 10:23:52 -07:00
46d0140d94 [auto] Update onnx to 541512b - tests for type and shape inference for Random generator ops (#880)
541512b93a
2018-05-04 16:02:33 +00:00
4abb229960 Double-dispatch copy. (#7197)
* Double-dispatch copy.

In order to split ATen's CPU/CUDA code into two separate libraries
which don't require a build flag (AT_CUDA_ENABLED) to separate them,
we need to be able to split source files based on whether or not they
handle CPU functionality only, or also touch CUDA.  Copy poses a unique
challenge here, because the naive implementation involves writing
a matrix for all combinations of CPU/GPU in a single file.

This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting
the following matrix:

    to\from    CPU           CUDA
          +---------------------------
    CPU   | CPUCopy.cpp   CUDACopy.cpp
    CUDA  | CUDACopy.cpp  CUDACopy.cpp

When you run x.copy_(y) where x is CPU and y is CUDA, we do a second
virtual dispatch to copy_from(y, x) on y's type, so that we can get
from CPUCopy.cpp to CUDACopy.cpp

The new autogenerated code for CPU looks like this:

Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const {
  // code generated by copy_wrapper
  checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false);
  switch (src.type().ID()) {
    case TypeID::CPUByte:
        THByteTensor_copyByte(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUByteTensor*>(src.pImpl)->tensor);
        break;
    case TypeID::CPUChar:
        THByteTensor_copyChar(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUCharTensor*>(src.pImpl)->tensor);
        break;
    ...
    default:
      return src.type().s_copy_from(src, dst, non_blocking);

Notice that the fall through goes to s_copy_from.  s_copy_from is like s_copy
but the arguments are reversed.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Lintfix and no-CUDA fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix compilation erorr.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-04 11:58:22 -04:00
053b68c4da Fix USE_ATEN flag in caffe2 (#7252) 2018-05-04 08:30:08 -07:00
67d0d14908 Rename autograd namespace to torch and change torch.h into python.h (#7267)
* Rename autograd namespace to torch and change torch.h into python.h

* Include torch.h instead of python.h in test/cpp/api

* Change some mentions of torch.h to python.h in C++ extensions

* Set paths directly, without find_path
2018-05-04 08:04:57 -07:00
bcffb5aa1d Remove SLEEF and all dependent code paths (#7268)
Temporarily remove this dependency.
2018-05-04 14:41:09 +00:00
0829d4502d Trace size-dependent expressions correctly (#6554)
This makes the JIT tracer much more robust, by allowing it to record
dependencies on tensor sizes. For example, if you were to trace this
function

def fn(x):
    return x.view(x.size(1), -1)

before this patch, then it would embed the actual value of x.size(1)
in the trace as a constant, making it very hard to have e.g. batch size
independent traces. Now, this will correctly record the dependency, and
will retrieve the size of x at every run.
2018-05-04 10:55:39 +02:00
da654337e0 Add support for type annotations in Python functions (#7009) 2018-05-04 10:54:19 +02:00
6363faf184 Fix issue #7209 in DataLoader (#7265) 2018-05-04 10:51:46 +02:00
159c75a2ca [auto] Update onnx to e35126b - add type inference function for classifier ops. (#882)
e35126bc4b
2018-05-04 08:03:07 +00:00
739d3d48ec [auto] Update onnx to 7ee7d0b - enable Werror=sign-compare on linux (#867)
7ee7d0b57a
2018-05-04 08:02:14 +00:00
d856bfc1bf [auto] Update onnx to e35126b - add type inference function for classifier ops. (#882)
e35126bc4b
2018-05-04 06:47:08 +00:00
98c24fae6b Fix broadcasting error in LogNormal and TransformedDistribution (#7269) 2018-05-03 23:03:51 -04:00
8325206c6f A clip grad fix for sparse tensors. (#7257) 2018-05-04 00:35:32 +02:00
a95b7b13f9 Extend support to arbitrary ops in init net when converting c2 models to onnx (#7256) 2018-05-03 15:34:47 -07:00
8091388d0f Add support for __floordiv__ and __rdiv__ for integral tensors (#7245) 2018-05-03 23:34:59 +02:00
371cc1e2db update the gif for 0.4 (#7262) 2018-05-03 14:23:08 -07:00
92f54e1f01 remove static libstdc++ linking and PYTORCH_BINARY_BUILD env variable (#7259) 2018-05-03 12:32:57 -07:00
3ae92b3a8b Fix lint errors (#7247) 2018-05-03 12:17:23 -07:00
e625ecc41f [caffe2][nomnigraph] Fix NNPack conv-relu fusion for ping-pong naming, (#7199)
add test for it and make tests python3 compatible
2018-05-03 12:12:24 -07:00
c96f2624a2 Speedup sparse init (#6899)
* Sparse initialization speedup

* +empty line

* simplify indexing

* Can't reproduce locally...

* Can't reproduce locally...+

* Can't reproduce locally...+

* Fix test, cleanup
2018-05-03 14:29:12 +01:00
4ab6ea5b1f Add unbuffered flag to distributed node launcher (#7226) 2018-05-03 11:49:06 +02:00
79245306c7 Fix onnx sum (#7232)
* fix onnx ReduceSum generation

* allow handle_only_zero_dim to return none to make mypy happy
2018-05-03 00:18:16 -07:00
f9393ffc90 Remove unneeded entry for NCCL in .gitmodules (#7216)
NCCL currently is not a git submodule. The NCCL source code is
bundled in 'third_party/nccl'.

Closes #7150
2018-05-03 00:07:58 -07:00
c4078b42b4 Add docstring for Tensor.tolist (Fixes #7095) (#7182) 2018-05-02 23:58:32 -07:00
6538ae5c16 clean up runtime dockerfile, use cuda 9 package (#7230) 2018-05-02 23:54:05 -07:00
7c70c3bdca Fixes for C++ build on macOS (#7192)
* Fix C++ build on Mac

* Enable CI on Mac

* Create NO_API switch to only build jit without api

* More fixes

* Fixes to CMake
2018-05-02 23:06:04 -07:00
1313791015 Need an explicit flag since opencv is on by default (#7225) 2018-05-02 21:00:34 -07:00
aa38ae303d [build] Setup to build ATen from root CMake file (#7163)
* Setup to build ATen from root CMake file

* Move aten/src/TH/cmake into cmake/Modules

* Add special code path for FindMKL for merge
2018-05-02 19:33:31 -07:00
681baa9254 Restore warning to torch.range. (#7194)
Also, get rid of warning specification in Declarations.cwrap, which currently has no effect.
2018-05-02 21:53:00 -04:00
07513cfd1d implement sum over multiple dimensions (fixes #2006) (#6152) 2018-05-02 21:50:29 -04:00
e25e501bea Fix build for osx (#7187)
For some reason, this used to build in autogradpp but requires us to put the declaration in the .cpp in PyTorch.
2018-05-02 21:08:14 -04:00
d154d32890 Fix to a conda hack (#7212) 2018-05-02 17:35:15 -07:00
8ac6856e54 Removing features for a sec (#7211) 2018-05-02 17:11:19 -07:00
faef70b5b0 Fixing a bug in my bug fix (#7210) 2018-05-02 17:02:24 -07:00
a10870a2d1 [auto] Update onnx to 676e0c7 - Type and shape inference for generator ops (#871)
676e0c7726
2018-05-02 23:36:33 +00:00
83622abd9f Reroute aten to use the root cmake system (#7188) 2018-05-02 16:25:56 -07:00
1ca6e77615 Fix to comput_70 error + some more lowercasing (#7205) 2018-05-02 15:34:35 -07:00
93242d320f fix scale on some tensors (#7189) 2018-05-02 15:33:02 -07:00
a61d4a3374 [Caffe2] Refactor reduce ops to take flexible input types (#7164)
* Refactor reduce ops to take flexible input types

* Add DISPATCH_FUNCTION macros in common_gpu.h

* Use macros to reduce switch case in dispatching cuda functions
2018-05-02 12:08:38 -07:00
197412fa8f Fix typo in comment (#7183) 2018-05-02 11:58:30 -07:00
619a56bf21 Emergency new fork for ideep (upstream lost commits). (#7191)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-02 14:50:47 -04:00
88a705555a Add SLEEF for float and double (#6725) 2018-05-02 18:40:44 +00:00
4d2693973e [Caffe2] Turning on ATEN for Caffe2 in integrated builds (#7169)
* Turning on ATEN for Caffe2 in integrated builds

* Adding slim version

* Fixing missing name suffix, fixing conda tests
2018-05-02 11:16:29 -07:00
1904058370 update logos (#7184) 2018-05-02 10:56:20 -07:00
e6330559c8 [auto] Update onnx to c7055f7 - update defs for reduce, rnn, and tensor depth-space ops (#847)
c7055f721c
2018-05-02 16:41:28 +00:00
604f907bc7 Restore filename and line number on AT_ASSERT. (#7152)
AT_ASSERT is an internal, PyTorch specific error, so we should
give a little more debug information (than with the ordinary
errors.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-02 07:49:31 -07:00
f07f24db0b Change unique name so that you are guarenteed: (#7166)
```
JIT_ASSERT(v->setUnique(x)->uniqueName() == x);
```
This works by changing any other value in the graph with name x to a
different name. This mirrors llvm behavior and is useful when you
want to ensure some names have particular values.
2018-05-02 07:32:01 -07:00
ebebfce681 Minor THD cleanup (#7161)
* Remove stale THD README

* Move common THD dependency into THD/base

The master_worker directory now no longer contains files that are
needed for building other parts of THD.
2018-05-02 07:29:27 -07:00
414e0b4b6f Split up CPUApplyUtils for perf (#7168) 2018-05-02 14:22:36 +00:00
664fe34e0a [Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116)
* [fix] Re-enable events in RNN ops

We have earlier added event disabling in RNN ops as back then we didn't use
events, with current use cases this is no longer true
(https://fburl.com/8vd0lp8y)

* use ops with cude impl

* Revert D7729695: [caffe2][fix] Re-enable events in RNN ops

This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [observer] Clean up observer_config.h

#accept2ship

* [1/n] Refactor dataio_test.py

Replace code duplication with a common function

* Add barrier net that runs before training nets

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.

Removed explicit data_parallel_model.py.synchronize call in holmes workflow.  Similar change in speech/asr_training workflow will come in another diff.

* Support the dnnlowp backend in caffe2_benchmark

This is for SHARE operator latency evaluation

* Migrate integral_image_op to main caffe2

migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi
to caffe2/caffe2/operators and implement its CPU version. Write up a test
using the hypothesis_test mechanism

* [pos_disc, fbcode] Implement unjoined lr loss

As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss.

The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where
    loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x))

For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as
    loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x))

Then the final expression becomes
    loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0)))

where y is the true label, x is the dot product and p = logistic(x).

This kind of implementation is align with the current implementation of the original cross entropy in
https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13

* Keep the array to fix the conflict

* [C2] Compute Adagrad effective LR

The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob.

* Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs

1. Open-source extractMetaNetDef and runGlobalInitialization, for use in
2. new Predictor constructor from db file.
3. Add new run function that returns outputs as TensorMap

* Disable eigen cpu

Disable eigen cpu in transpose and reduce

* Introduce request_only/object_only property of ModelLayer

by default this is False

* A simple TC Caffe2 benchmark

We can run tunner, get MappingOptions and then use them to
compare against cuBLAS

currently broken due to LLVM issues. How to run:

hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01
add D7401202
add D7434625
add D7506031
add D7540728

buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark

* Move Caffe2 feature_maps_ops to open source

Need feature maps operators in open source project facebookresearch/BlueWhale

* Manually fix the conflicts in channel shuffle op

* Fix the inconsistency between different gh and fbcode

* Skip Adagrad GPU Test (Because some gpu implementation is missing)

* Fix another test to make sure it won't run on gpu when implementation is not available yet
2018-05-01 20:49:00 -07:00
967c4a0c18 [caffe2][nomnigraph] Fix NNPACK relu fusion for inplace relu (#7124) 2018-05-01 16:26:54 -07:00
20666feb2c [caffe2][nomnigraph] Add compatibility for MSVC, which lacks some C++11 language features (#7158) 2018-05-01 16:26:20 -07:00
f3c76b9b78 Remove specifications from Declarations.cwrap that have no effect and are already handled. (#7147)
These changes are already handled, either in native functions or via resize specifications in Declarations.cwrap.

The resize_ one is technically not handled, although in TH it is checked if the storage is actually reallocated; this is less strict, but seems okay.
2018-05-01 19:10:31 -04:00
a9f2ee0817 CPUApplyUtils is faster if iterate is split into two steps (#7148) 2018-05-01 22:32:02 +00:00
9ba503ac9c [caffe2][nomnigraph] Add ability to pass the old net to convertToCaffe2Proto (#7149) 2018-05-01 15:31:07 -07:00
1418cc72d6 Make refcount in THMapInfo atomic. (#7135)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-01 18:14:46 -04:00
a5e1d4a049 Delete dead header (#7153)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-01 18:14:06 -04:00
08a853b02c Add rsqrt op in caffe2 (#7154) 2018-05-01 15:06:53 -07:00
a8b059edcc [auto] Update onnx to 69894f2 - Use op schema.all tensor types in random like definitions (#865)
69894f207d
2018-05-01 21:30:49 +00:00
762eb3ddc8 [Caffe2] Add moments op in caffe2 (#7114)
* Add moments op in caffe2

* Use rsqrtf in float for group_norm

* Add docs for default behavior when axes is not provided.

* Update group_norm_op by using Eigen::sqrt on CPU
2018-05-01 12:19:08 -07:00
323e3aca47 A small fix for aten cmake (#7141) 2018-05-01 12:12:29 -07:00
dfe1bae3cd [caffe2][nomnigraph] Move tests to proper gtest suite (#7046) 2018-05-01 12:00:43 -07:00
bcadf92ad5 Move codegen from setup.py to CMake for C++ libraries (#7121)
* Generate code without setup.py for C++ build

* Move code generation to CMake

* Set DEPENDS files correctly

* Fix some errors in codegen

* Fix blank line lint
2018-05-01 11:30:13 -07:00
5d3c3c53aa Add raw IR serialization/deserialization (#6392) 2018-05-01 20:21:29 +02:00
ca8ee4c1e1 [auto] Update onnx to b9d6b90 - Clarify random like operators (#846)
b9d6b90a64
2018-05-01 17:54:27 +00:00
2a18e7c45b Have python dispatch respect 'auto_gpu' and 'with_gil'. (#7137) 2018-05-01 13:51:02 -04:00
8031da5479 Implement torch.as_tensor, similar to numpy.asarray. (#7109)
* Implement torch.as_tensor, similar to numpy.asarray.
torch.as_tensor behaves like torch.tensor except it avoids copies if possible; so also somewhat like tensor.new but without the size overloads.
I didn't add a requires_grad field, because we haven't decided on the semantics such as as_param.

* Remove requires_grad for doc.
2018-05-01 12:54:43 -04:00
1f5b392da0 [auto] Update onnx to fc6b5fb - Refactor shape inference implementation (#855)
fc6b5fbb6d
2018-05-01 15:04:47 +00:00
15b12e6f8a Add support for MKLDNN on Windows (#7130) 2018-05-01 10:57:16 -04:00
7968ee0f59 Removing references to CUDA_SDK_ROOT_DIR to see if it breaks anything (#7125) 2018-05-01 07:52:16 -07:00
87e6362393 Add more warnings to C++ API build (#7123)
Enables more warnings in the C++ API build.

Fixed a bunch of things in torch/csrc/.

Mostly taken from c10

* Enable -pedantic for C++ build

* Enable more warnings

* Include CUDA and library headers with -isystem

* Fix sign-promo warning
2018-05-01 10:40:22 -04:00
0427afadd1 Make AT_ASSERT/AT_ERROR non-printf based, other tweaks (#7104)
* Make AT_ASSERT/AT_ERROR non-printf based, other tweaks

- AT_ASSERT/AT_ERROR don't take printf strings anymore; instead,
  they take a comma-separated list of things you wanted to print
  (bringing it inline with Caffe2's conventions).

  Instead of AT_ASSERT(x == 0, "%d is not zero", x)
  you write AT_ASSERT(x == 0, x, " is not zero")

  This is done by way of a new variadic template at::str(), which
  takes a list of arguments and cats their string reps (as per
  operator<<) together.

- A bunch of the demangling logic that was in Error.h is now
  moved to Error.cpp (better header hygiene.)  Also, demangle
  has been moved out to its own helper function, and also
  a new helper demangle_type (from Caffe2) added.

- A bunch of AT_ASSERT converted into AT_CHECK, to more properly
  convey which checks can be caused by user error, and which are
  due to logic error in ATen.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix test failure.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* buildfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* More fixes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* One more fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Try harder

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-01 10:28:31 -04:00
24461a756a Separate "-Xcompiler <...>" into 2 elements because ${nvcc_flags} (when using CUDA_SEPARABLE_COMPILATION) doesn't recognize it. (#7118)
This solves the "nvcc fatal : Unknown option 'Xcompiler -MD'" issue where nvcc gets -'Xcompiler -MD'.
2018-05-01 09:31:43 -04:00
dccfdf317b Fix example of torch.clamp (#7131) 2018-05-01 14:52:32 +02:00
ba046331e8 add spectral normalization [pytorch] (#6929)
* initial commit for spectral norm

* fix comment

* edit rst

* fix doc

* remove redundant empty line

* fix nit mistakes in doc

* replace l2normalize with F.normalize

* fix chained `by`

* fix docs

fix typos
add comments related to power iteration and epsilon
update link to the paper
make some comments specific

* fix typo
2018-05-01 17:00:30 +08:00
23a5ddd3c8 [auto] Update onnx to b7d8dc8 - fix cmake warning message (#863)
b7d8dc8fa6
2018-05-01 08:21:41 +00:00
e8916f510b [auto] Update onnx to f585c5d - add pytorch-operator test for tile (#831)
f585c5d066
2018-05-01 07:22:44 +00:00
c72e5da7eb [auto] Update onnx to 993fe70 - add install step (#832)
993fe70805
2018-05-01 07:21:36 +00:00
5acc62ffa5 Skip Tile onnx backend to keep CI green (#7120) 2018-04-30 22:37:34 -07:00
892bef9aa3 [ONNX] Delay external value resolution as long as possible in ONNX backend (#7111) 2018-04-30 21:30:31 -07:00
0b0279981d Fix example for new_zeros in documentation (#7128)
Fix for Issue #7088
2018-05-01 00:29:13 -04:00
531944275c [Caffe2] Guard CUDA API calls in caffe2/operators using macro CUDA_CHECK (#6810) 2018-04-30 21:27:37 -07:00
150af6ac1e Move ideep ops from caffe2/contrib/ideep to caffe2/ideep (#7112) 2018-04-30 21:10:46 -07:00
b2cdd08252 Introducing onnx-tensorrt to third_party (#7119) 2018-04-30 21:09:51 -07:00
4add3a4df7 Add dependency from caffe2_gpu to ATen in CMake (#7117) 2018-04-30 19:30:34 -07:00
cdc6d104e2 [auto] Update onnx to 68bc26c - add type inference for traditional ml ops except classifier ops. (#857)
68bc26cfb2
2018-05-01 02:21:49 +00:00
b3be71f046 [easy] Stop hardcoding "python" executable in bottleneck tests (#7105)
Right now, the bottleneck test_utils.py tests assume that a user's
python executable is 'python'. This may not be the case especially if
the user has multiple versions of python installed. This PR changes it
so that test_utils.py uses `sys.executable` as the python executable.
2018-04-30 22:01:36 -04:00
afe3c2688f Update C++ API tests to use Catch2 (#7108)
* Update C++ API tests to use Catch2

* Update download_mnist.py to be less verbose
2018-04-30 21:36:35 -04:00
25e7d5c612 Make @ebetica and @goldsborough owners for test/cpp/api (#7113) 2018-04-30 21:35:13 -04:00
6e72ba9798 [Caffe2] Fail fast for C++ unit tests too (#7106)
* Fail fast for C++ unittests too

* Fix based on comments
2018-04-30 17:30:03 -07:00
7efd6f0506 [auto] Update onnx to 9cc0cda - fix string representation of scalar types (#858)
9cc0cdabd3
2018-05-01 00:07:32 +00:00
ab44002ac8 Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs (#7063)
* Refactor extractMetaNetDef and runGlobalInitialization into open...

* Fix test by making get output blobs optional

* Update test instead of making output blobs optional
2018-04-30 17:01:27 -07:00
71f6cca992 Make @ebetica and @goldsborough owners for torch/csrc/api (#7110) 2018-04-30 15:48:12 -07:00
bd69d2fd23 [auto] Update onnx to 1078925 - fix y in pow test case to scalar (#852)
1078925c2d
2018-04-30 22:42:37 +00:00
f87462c65f [Caffe2] Fix the wrong argument name in collect_and_distribute_op (#7091)
* Fix the wrong argument name, FPN works!

* Fix collect_and_distribute test
2018-04-30 15:01:11 -07:00
50218a25e7 [EASY] Document load_inline (#7101)
* Document load_inline

* Link to tests for examples

* Links in RestructuredText are weird
2018-04-30 14:36:41 -07:00
1ea3f79569 Location of pip package changed (#7100)
* Location of pip package changed

* They moved setuptools two days ago too
2018-04-30 14:35:17 -07:00
95681257d6 Revising cudnn version check (#7062) 2018-04-30 14:34:41 -07:00
af71fb882f Merge autogradpp into PyTorch (#7074)
* Dump autogradpp into PyTorch

* Fixed up CMake for autogradpp/C++ API

* Made cereal a submodule

* Change search location of autogradpps mnist directory

* Add test_api to CI

* Download MNIST from the internet instead of storing in repo

* Fix warnings
2018-04-30 12:53:46 -07:00
3407708b81 Remove unused variable (#7103) 2018-04-30 12:53:28 -07:00
bf9fab3cf3 [auto] Update onnx to c66fb6f - Add some math function shape inference (#845)
c66fb6f077
2018-04-30 19:45:21 +00:00
20c965f7d6 fix max/min on cuda in presence of NaN (fixes #6996) (#7052)
Thank you ngimel and zou3519!
2018-04-30 21:02:47 +02:00
90026f59a3 Switching to conda's --no-test flag (#7099)
* Switching to conda's --no-test flag

* Also updating callsite in .jenkins/build.sh
2018-04-30 11:22:25 -07:00
9a3c723644 Add missing PrintOp arguments doc (#7084) 2018-04-30 11:17:56 -07:00
caa6a8ce30 Switch to the official git mirror for Eigen. (#7090) 2018-04-30 14:09:18 -04:00
39c0b0b850 Delete unnecessary header includes. (#7094)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-30 14:04:28 -04:00
2a56666196 Removing leveldb to make special gcc builds unnecessary (#7098) 2018-04-30 10:55:37 -07:00
b70b7a80d4 Inline JIT C++ Extensions (#7059)
Adds ability to JIT compile C++ extensions from strings

>>> from torch.utils.cpp_extension import load_inline
>>> source = '''
    at::Tensor sin_add(at::Tensor x, at::Tensor y) {
      return x.sin() + y.sin();
    }
'''
>>> module = load_inline(name='inline_extension', cpp_sources=source, functions='sin_add')
Fixes #7012

* Inline JIT C++ Extensions

* jit_compile_sources -> jit_compile

* Split up test into CUDA and non-CUDA parts

* Documentation fixes

* Implement prologue and epilogue generation

* Remove extra newline

* Only create the CUDA source file when cuda_sources is passed
2018-04-30 11:48:44 -04:00
c5978db094 [auto] Update onnx to ff667d1 - Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853)
ff667d1dfb
2018-04-30 15:37:06 +00:00
d9aeb7e71b clamp now has subgradient 1 at min and max (#7049)
* subgradient 1 at min and max for clamp

* clamp max and clamp min too

* add comment
2018-04-30 21:21:56 +08:00
8fbab83c2a only Tensors of floating point dtype can require gradients (see #7021) (#7034) 2018-04-30 10:20:00 +02:00
6a55d86234 GroupNorm docs (#7086) 2018-04-30 09:40:34 +02:00
881af544fd [auto] Update onnx to 11c6876 - clear initializer names when clear initializer (#849)
11c6876f1d
2018-04-30 07:00:36 +00:00
bc62645e4c [jit] Fix handling of IntList[k] parameters (#6965)
*  squash commits

* emit additional declarations and handle positional arg. case
* apply minor tweaks
* py-2 fix
* Address Tom's comments
* move logic to gen_jit_dispatch, start adding tests
* add test

* address review comments

* address review comment

* fix build issue. change argument indices to argument names. Get rid of deepcopy

* py-2 flake8 fix
2018-04-29 23:09:04 -04:00
96c6ae67bb Remove incorrect/irrelevant test code. (#7050)
Followup to #6873.
2018-04-29 23:03:44 -04:00
ee00a8049a Add max pooling support to EmbeddingBag (#5725)
* Add max mode support to EmbeddingBag

* Lint fix

* Fix compilation issue on other platforms

* Rebase + don't waste memory when not in max mode

* Oops, missed a spot

* Fix whitespace from merge

* less precision

* Lower precision to avoid spurious failures

* Minor typo

* Switch to size()
2018-04-29 16:48:11 -04:00
49f87320ba [Caffe2] Add full impl of GroupNorm (#7058)
* Add full impl of GroupNorm

* Fix comments in math.h

* Remove unsed buffers

* Add #include <array> in gpu version

* Remove unused moments_buffer_

* Make inverse std to be a template.

* Add detailed comments
2018-04-29 11:26:40 -07:00
0703357723 Don't build THD/master_worker if not explicitly requested (#7081) 2018-04-29 13:17:09 -04:00
b240cc9b87 Add support for dotted names in CPP Extensions (#6986)
* Add support for dotted names in CPP Extensions

* Modify tests for cpp extensions

Test that dotted names work

* Py2 fixes

* Make run_test cpp_extensions Win-compatible
2018-04-29 18:10:03 +02:00
e6ce1afe47 [Caffe2] Follow-up of onnx-trt API change (#7076)
* Follow-up of onnx-trt API change

* indent

* comments
2018-04-28 23:07:15 -07:00
7450e9152b [auto] Update onnx to 73c34ae - Clarify FeatureVectorizer description. (#843)
73c34ae62f
2018-04-28 22:43:39 +00:00
281f095972 Add autograd API to at::Tensor (#6582)
* Add autograd API to at::Tensor

* Trying to fix linker errors on Windows

* Add AT_API to set_data
2018-04-28 12:54:05 -07:00
802e718e1c [auto] Update onnx to 1befb9b - Remove useless text in docs (#850)
1befb9b12d
2018-04-28 17:30:40 +00:00
4caea64d72 Make all of TH and THC C++. (#6913)
Changelist:

- Move *.c to *.cpp
- Change includes of ".c" to ".cpp"
- A bunch of cmake configuration modifying CMAKE_C_FLAGS changed
to CMAKE_CXX_FLAGS or add_compile_options, because if you do CMAKE_C_FLAGS it only applies when you compile C code
- Explicitly cast void* to T* in a number of places
- Delete extern "C" { ... } blocks; instead, properly apply TH_API to everything that should have it (TH_API handles extern "C")
- Stop using stdatomic.h, instead, use <atomic>. This resulted in a bunch of placement-new/delete to be "totally properly correct"
- Refactor of THLongStorageView to not have static constructor methods (since it no longer has a copy/move constructor)
- Documentation about how the TH C interface (and extern C business) works
- Note that THD master_worker mode is dead
- C++ headers in TH libraries are given .hpp suffix, to make it less likely that you'll confuse them with the C-compatible headers (now suffixed .h)
- New function THCStream_stream and THCStream_device to project out fields of THCStream instead of accessing fields directly
- New function THStorage_(retainIfLive), which is equivalent to a retain but only if the refcount is greater than zero.
- In general, I tried to avoid using hpp headers outside of ATen/TH. However, there were a few places where I gave up and depended on the headers for my own sanity. See Note [TH abstraction violation] for all the sites where this occurred. All other sites were refactored to use functions
- Some extra Werror fixes (char* versus const char*)
2018-04-28 07:45:02 -04:00
4667983f0f Fixes for interpreter and ONNX export for translation (#7044)
Fixes for interpreter and ONNX export for translation

Address comments
2018-04-27 22:23:57 -07:00
fc6a846cc5 [Caffe2] Fixing bug in conda builds (#7061)
* Fixing bug in conda builds

* Update to other PR
2018-04-27 21:52:40 -07:00
1048d0dd67 [Caffe2] Moving all conda package information into package name rather than build string (#7041)
* Lowercasing script internal variables

* Removing nccl from name
2018-04-27 21:42:49 -07:00
065cd32ed0 Fix ".pb.h" dependency issue about DLL build. (#7027)
* Add missing header "caffe2/core/common.h" before "caffe/proto/caffe.pb.h" to provide CAFFE2_API macro.
This only affects the Windows build since CAFFE2_API is only defined for DLL.

* Fix ".pb.h" dependency issue about DLL build.

CAFFE2_API defined in "caffe2/core/common.h" is required by ".pb.h" generated on Windows for DLL build.
We always need to have "#include <caffe2/core/common.h>" before using any proto header.

In this case "caffe2.pb.h" is already included by "context_gpu.h" -> "common_cudnn.h" in the correct order, hence we simply remove a line.
2018-04-27 21:21:46 -07:00
bb9c859253 [auto] Update onnx to e84788f - Fix SELU attributes' default values (#839)
e84788fb48
2018-04-28 04:18:49 +00:00
20cd27da42 [caffe2][ONNX] Implement CPU NumpyTileOp and corresponding ONNX backend (#7053)
* Implement CPU NumpyTileOp

* Address comments
2018-04-27 19:58:15 -07:00
2e023a29e4 Add optional support to C++ extensions (#7055) 2018-04-28 01:59:50 +01:00
7b09bc72a5 [WIP] Enable WERROR in tests (#6539)
* Enable WERROR in tests

* Also set WERROR=1 for cpp_build in CI

* Enable Werror after the compiler checks

* Remove -DWERROR because its picked up from the env var

* Had to fix some errors in aten/contrib/data

* Allow an uninitialized variable in ReduceOpsKernel.cpp

* Use CUDNN_DATA_UINT8 in cuDNN type string conversion

* Fixes and use target_compile_options

* Fix uninitialized variables in THNN

* Include Python.h earlier in tensor_types.cpp

* Use CUDNN_VERSION 7100 instead of 7000?

* More Python.h includes

* Make switch case in common_subexpression_elimination.cpp exhaustive

* Build with WERROR=0 just to see all the warnings

* Remove some Python includes

* Enable WERROR=1 again

* Bring back switch case default
2018-04-28 01:51:16 +01:00
733e2967b1 Allow __constant__ values in a ScriptModule to be used as attributes for builtin functions (#7017)
* Allow `__constant__` values in a ScriptModule to be used as attributes for builtin functions
* Fix bugs in @script loops

1. while loops run shape propagation multiple times until the shapes have converged.
There were two bugs here. (a) First the 'changed' condition was not checking if it actually
changed the output, and instead would mark changed = true if the two inputs were different.
This incorrect because the output of the block and the input of the block may always have different shapes.
Now it actually checks if it is about to change the output entry that it is writing to.
(b) expand nodes were being inserted into the graph even inside the while loop body. However, if
we iteratively discover that the input shape to one of these expands is actual dynamic, then
it was incorrect to insert the expand in the first place. This changes it so that we only insert expands
after we have converged on the shapes.

2. the way deleteExtraInputs removed loop-carried dependencies was unsafe because it would lookup
Value* elements in the loop body's environment that were previously invalidated when deleteExtraInputs
remove another input to the loop. This changes the way deleteExtraInputs works so that it never has to
read a value out of the loop body's environment to avoid using the invalidated pointers.
2018-04-27 17:44:17 -07:00
02a764f82d Update the video input op in caffe2 (#7054)
There are multiple fixes to the video input op recently. This is to update
the caffe2 version so that it is up to date.
2018-04-27 17:17:42 -07:00
980960d036 Fix Visual Studio error C2398 about ill-formed narrowing conversion. (#7024) 2018-04-27 17:07:56 -07:00
59f5f9ac36 [caffe2] Fix build of depthwise_3x3 for CUDA compute capability < 3.5 (#7048)
PR #6601 broke build on older CUDA targets due to __ldg intrinsics. This patch adds a work-around.
2018-04-27 18:53:24 -04:00
361648a4a7 Fix torch.tensor(...) device-type calculation when used with numpy an… (#6995)
* Fix torch.tensor(...) device-type calculation when used with numpy and type inference.

* Fix tensor device type inference as well.

* Better variable type inference: infer cuda-ness only if device is not specified.
2018-04-27 18:12:33 -04:00
0c737dff63 fix lbfgs variable names (#7037)
Switches the step/direction variable names (steps and directions are flipped
in the current implementation of the two loop-recursion). This change does
not change the numerical output of the program, but should make it easier
to follow.
2018-04-27 17:47:37 -04:00
6ce376fee3 [auto] Update onnx to ebac046 - Add tile test case (#823)
ebac0463a0
2018-04-27 21:01:58 +00:00
f630de8f33 [caffe2][nomnigraph] Lint run (#7045) 2018-04-27 12:58:58 -07:00
932c4c2364 Prevent stack overflow on deletion of deep graph (#6873)
* Prevent stack overflow on deletion of deep graph

Fixes #5534.

Sometimes one can end up with a very big computation graph of Functions
and Edges. Each std::shared_ptr<Function> contains a list of Edge, and
each Edge contains a std::shared_ptr<Function>. Deleting a
std::shared_ptr<Function> can trigger the recursive deletion of other
std::shared_ptr<Function>'s: this can stack overflow if the graph
is deep enough. Here is an example of such a graph:

    shared_ptr<Function> -> Edge -> shared_ptr<Function> -> Edge -> ... -> shared_ptr<Function>

The solution here is to use a custom deleter with each
std::shared_ptr<Function>. The custom deleter keeps track of how many
nested deleters it is in. When this number exceeds the maximum allowed
depth, the Function* to be deleted are accumulated in a per-thread
delete queue and handled by one of the deleters.

Example code that could trigger the overflow (set ``depth`` to something >
100000) is below. I also benchmarked the below code before/after the
changes to see if there are any significant performance differences.

```
import torch
def scope():
    depth = 80000
    x = torch.randn(9, requires_grad=True)
    y = x.clone()

    # build deeply nested computation graph
    for i in range(depth):
        y = y + y * 0.000001

%timeit -n 100 scope()

376 ms ± 3.94 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Without changes:
352 ms ± 6.58 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

With the change, the above code is 6.8% slower.

UPDATE: I did some more benchmarking. It looks like it takes 25% more time to free the computation graph in the case of the straight chain graph: https://gist.github.com/zou3519/93cf84d96ae431356ae7f7c1923ef51a

* WIP

* Add custom deleter to PyFunctions created by THPFunction

* Address some comments; pick new value

* Address some more comments

* Add more complicated test; special case the windows depth constant
2018-04-27 15:49:58 -04:00
c730792d51 Add big warning about averaging to KLDivLoss documentation #6622 (#7006)
* Add big warning about averagin to KLDivLoss documentation #6622

Also: An (independent) change in diagonal docstring tensor
formatting.

* Improve note with example

Thank you Richard Zou!

* use log_softmax
2018-04-27 15:45:26 -04:00
ae35e0e924 Support non-contiguous tensors for unary ops (#6119) 2018-04-27 21:31:34 +02:00
a6bfa16c17 torch.arange: add numpy-style type inference. (#7016)
* torch.arange: add numpy-style type inference.

This is a backwards-compatibility breaking change.

* Fix flake8.

* Use at::optional.

* Remove unneeded header files.

* Use reference wrapper.

* Update arange for test.

* Address review comments.
2018-04-27 15:11:45 -04:00
bdd27ea956 [auto] Update onnx to 8b7a925 - a few more shape inference functions (#772)
8b7a9252c9
2018-04-27 19:06:58 +00:00
f6083b343b [auto] Update onnx to 9718f42 - Make the coefficient non optional for LinearClassifier (#836)
9718f42976
2018-04-27 18:06:39 +00:00
39c6101ab4 [auto] Update onnx to ef083d0 - Add save_tensor and load_tensor functions for Protos (#770)
ef083d0338
2018-04-27 17:13:59 +00:00
1b0ad8678b import *Sampler to utils.data (Better fix than #6982) (#7007) 2018-04-27 10:18:29 +02:00
3d4d39ce30 Also check compiler ABI compatibility when JIT compiling (#7015) 2018-04-27 08:19:17 +01:00
9db779f331 [auto] Update onnx to 45ceb55 - Check if CMAKE_BUILD_TYPE set before project(). (#812)
45ceb5523a
2018-04-27 04:51:00 +00:00
76d3c30783 Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#6445) 2018-04-26 19:27:24 -07:00
eaab6ce459 [caffe2][nomnigraph] Move nomnigraph<->caffe2 converter logic to caffe2/opt (#7018) 2018-04-26 18:28:13 -07:00
18ed2160b0 Use Index rather than Long for IntList parsing (#6674)
* Use Index rather than Long for IntList, so floating-point types convertible to ints fail the parsing.

Basically, our unpackLong code works with floating-point types that are convertible to ints, but this isn't often what you want (because of truncation).
What you actually want is to convert to an index, which will usually find such issues.

I made this the minimal change I could because:
1) I didn't want to change unpackLong because the existing code call checkLong before unpackLong, so this should be a non-issue most of the time.  And fixing this properly requires calling checkLong again, which will slow everything down.
2) An exception above is with IntList, which only checks that 1) it is a tuple or 2) it is a varargs tuple (i.e. torch.ones(1, 2, 3)).

* Fix bug.

* Don't conflict tensor and IntList bindings.

* Change function to be consistent between python 2 and 3.

* Check Index.

* Move IntList overloads in legacy new functions to below Tensor overloads.
2018-04-26 19:13:23 -04:00
902579602b [wip] [Caffe2] Changes to integrated binaries (#6997)
* Changes to integrated binaries

* Changes for cpu version of integrated binary

* Disabling static linking of CUDA for pytorch for integrated builds
2018-04-26 15:43:24 -07:00
19cb5a0436 [auto] Update onnx to 4b3d2b0 - [WIP] reenable shape inference tests (#834)
4b3d2b02e8
2018-04-26 22:17:52 +00:00
d67ec68dbe [auto] Update onnx to 22d17ee - RNN tests: LSTM, GRU, SimpleRNN (#739)
22d17eee2e
2018-04-26 20:57:42 +00:00
a08091a42d Implement matmul_out and dot_out. (#6961)
* Implement matmul_out and dot_out.

* Fix autograd by only calling _out variants if we have an out ourselves.

* Disallow mismatched types in dot_out.

* Make sure out variant doesn't have a method.

* Do proper type conversion.
2018-04-26 16:52:58 -04:00
49493948a8 Fixes some build warnings. (#7004) 2018-04-26 16:44:23 -04:00
9a6c033004 Skip unsupported ONNX backend test cases (#7005) 2018-04-26 13:10:55 -07:00
242f6c3470 Don't print dots after nonfinite numbers in integral float tensors (#6835)
* Don't print dots after nonfinite numbers in integral float tensors

* get around lint

* support python 2

* refactor

* better refactor
2018-04-26 11:18:12 -07:00
2b44c420c8 Enhance diagonal (fixes #6479) (#6718)
* Enhance diagonal

This patch
- adds Tensor.diagonal to complement torch.diagonal
- implements diagonal natively in ATen
- makes diagonal a view
- implements taking arbitrary diagonals
- implements diagonal backward instead of referring
  to the (more limited) diag

* add tests, copy diagonal code to backward for double differentiability

* improve tests and doc comment. Thank you, Adam!

* Mark diagonal as view function in gen_autograd.py, use simple backward.
2018-04-26 11:11:20 -04:00
8109b3065e Slight changes to anaconda script (#6994) 2018-04-26 10:04:58 -05:00
b2581c0289 Workaround in onnx to get transposes into init_nets (#6924)
* Workaround in onnx to get transposes into init_nets

This adds a pass to ONNX so that it can speculate Transpose
operators so that ONNX's split pass can put them into an init_net

Also fixes a potential bug in onnx peephole where an optimization
across blocks might move a Value and violate scoping.

* Perform shape propagation when embedding a program into a trace.

This ensures the trace still has type information specific to that trace, which will help onnx export succeed in more cases.
2018-04-26 11:04:17 -04:00
a64b2987b4 [ONNX] export tile op (#6954)
* onnx export aten::repeat to Tile

* move repeats to input

* turn repeats to a long tensor constant

* deal with case that len of repeats bigger than number of dims in input
2018-04-26 11:03:41 -04:00
5dc5a71d74 Improve error message (Sampler location) Fixes #6917 (#6982)
Thank you @ruotianluo for reporting!
2018-04-26 10:58:27 -04:00
984516bdc4 typo corrected: is -> if (#6980) 2018-04-26 09:57:11 -04:00
3964253f94 Allowing for vectorized counts in Binomial Distribution (#6720) 2018-04-26 15:53:01 +02:00
f98b778086 Fix forward and backward for norm/renorm with infty norm (fixes #6817) (#6969) 2018-04-26 12:54:53 +02:00
24d05662ea [caffe2] Open-source DEPTHWISE_3x3 engine (#6601)
DEPTHWISE_3x3 engine provides an optimized implementation of depthwise 3x3 convolution, e.g. for ShuffleNet, MobileNets
Implementations exist for CPU (generic), ARM CPU, and CUDA GPU.

Originally developed by @ajtulloch
2018-04-26 02:30:51 -04:00
eb4154a007 [auto] Update onnx to 485b787 - function proto for composite op. (#802)
485b7875fa
2018-04-26 03:01:03 +00:00
3d907ef78e Consistently check 'out' variants against specified dtype/layout/device parameters. (#6973)
We were previously doing this in the most common cases, but not consistently.
2018-04-25 22:46:42 -04:00
c10da636b5 implement gamma cuda (#6855)
* Refactor standard_gamma and implement CUDA gamma sampling

* Attempt fixes for AT_CUDA_ENABLED changes

* Gamma cuda and cpu forward as ATen native

* implement standard_gamma_grad_cuda

* update native_test.cpp, try to fix windows and various cuda version compiles

* searching a windows fix via CI... use std:: for math

* casting some constants in the calculation, compute at float for half precision

* whitespace fixes

* add acctype to do half->float computation, include HALF in generation, cast locally rather than tensors

* fix cuda8 half compilation

* always use scalar_cast with CUDACC, lock CPU generator, CPU acctype = double\nThank you for your review comments!
2018-04-25 22:22:09 -04:00
7cbef70372 Fix the onnx symbolic for selu and maxpool3d (#6816) 2018-04-25 22:20:45 -04:00
645ad7ad0c Fixing LP-Pooling stability issues (#6766)
* Added ReLU unit to LP pooling, so the gradient does not become NAN if all inputs are zero.

* Added workaround for odd p. Added a bit of doc.

* Make the linter happy.
2018-04-25 22:13:15 -04:00
bd14d8e8f8 add additional caffe/caffe2 paths to exclude list in pytorch setup.py (#6891) 2018-04-25 22:10:38 -04:00
ab016a2b30 Code Cleanup: removes unused getTextureObject (#6974) 2018-04-25 21:07:48 -04:00
2d6d6a4d10 Removes unused _long functions in THCTensorIndex (#6971) 2018-04-25 21:07:28 -04:00
31c9b4f0d2 Changes incorrect "overlappingIndices" call to correct "maybeOverlappingIndices" (#6953)
* Changes incorrect "overlappingIndices" call to correct "maybeOverlappingIndices"

THE PROBLEM

The current overlappingIndices() is meant to detect if a tensor defines multiple valid indices for the same data element. There are two significant issues with this function:

(1) The algorithm it attempts to implement cannot do this.

(2) That algorithm is not implemented correctly.

This call is used by pointwiseApply() and scatter(). If a tensor is readable/writable and detected as overlapped these algorithms will create a non-overlapped copy of it to work on. When tensors are improperly identified as overlapped this causese extra work. If tensors are improperly identified as non-overlapped then this would cause the operations to exhibit unexpected behavior.

For example,

ref = torch.arange(0, 32 * 5).view(4, 8, 5).cuda().double()
p = ref[:,:,::2]
p += 1

Results in a call to pointwiseApply1, which detects p as an overlapped tensor (it is not), causing a call to pointwiseApply2 that copies it into a non-overlapped temporary, and then another call to pointwiseApply2 later that copies it back to the original tensor. If, however, the original tensor is given dimensions of (4, 8, 4), instead, it is correctly detected as non-overlapped and only a single pointwiseApply1 call is made.

DISCUSSION + FIX

The algorithm that overlappingIndices() attempts to implement tests for a sufficient but not necessary condition of a tensor to be non-overlapping. That is, if its algorithm were implemented properly then it would be a conservative check that would ensure all overlapped tensors were copied (as desired), but also that some non-overlapped tensors were copied too.

The algorithm can be thought of as trying to test whether the dimensions can be ordered like "nesting dolls," with each dimension fitting within the next one larger than it. If this is true then the tensor is non-overlapping, but if it's false the tensor may or may not be overlapped. For example, a tensor with dims (2, 3) and strides (4, 3) cannot be "nested," but is non-overlapping. (The tensor looks like [[0, 3, 6], [4, 7, 10]].)

The algorithm is currently implemented improperly, as can be seen in the example above. The tensor p has dimensions [4, 8, 3] and strides [40, 5, 2]. This confuses the current implementation, which thinks the innermost dimension needs a stride of 6, which is incorrect. The first row is [0, 2, 4] and the next row begins with 5. The current implementation also improperly implemented its sorting behavior. (qsort comparators require -1, 0, and 1, not true/false return values.)

Fixing the existing algorithm is straightforward (and what this PR does, see below), but it is important to note that the algorithm never performed as intended, so its name and the documentation around it has been updated, too. A natural question is if it's possible to write an efficient overlappingIndices(), and I believe the answer is "no." Disambiguating overlapping from non-overlapping tensors is equivalent to finding a nonzero solution to a linear diophantine equation with restricted coefficients, that is, an equation of the form x_0s_0 + x_1s_1 ... = 0 where s_X is the stride in dimension X and x_X is an integer from [-size_X + 1, size_X - 1].

Another note is that the CPU does not perform this check. For example, if we run:

a = torch.FloatTensor([[0,1], [10, 11]])
b = torch.FloatTensor([[0,0],[0,0]])
b = b.set_(a.storage(), storage_offset=0, size=a.size(), stride=(1,1))
b += 1

Then b is [[1, 3], [3, 11]] because the operation is applied twice to the second element of the original tensor. This causes no warning.

Since the CPU does not perform a similar check, another question is whether the GPU code should remove its check. While it may seem that writing to overlapping tensors is an error state, running test_cuda.py reveals 171 instances of possibly overlapped tensors being copied by pointwiseApply(). (The prior incorrect version has 176 copies.) Allowing writing to overlapped tensors on the GPU may violate assumptions about memory accesses, too. In fairness, these assumptions may be violated on the CPU already.

Leaving the CPU vs GPU behavior question for the future, this fix corrects the current intended GPU behavior. This means that there will be fewer unnecessary copies and no chance of an overlapped tensor sneaking through on the GPU. The CPU behavior remains unchanged. The fix also adds a test to test_cuda.py to ensure that overlapped tensors on the GPU are written to as expected.

* cleanup

* Fixes Python formatting
2018-04-25 21:07:13 -04:00
d48d3ef6bc Make cuda 9 behave as cuda 8 wrt half conversions (#6958)
* Make cuda 9 behave as cuda 8 wrt half conversions

Cuda 9 is too smart about implicit half conversions, this would disable them so that cuda 8 and cuda 9 behave in the same way wrt half.

* try fixing windows build

* one more broken conversion
2018-04-25 17:59:49 -07:00
5209213fa7 [auto] Update onnx to cd58928 - specify defaults for attributes of Affine op (#820)
cd589283a0
2018-04-26 00:26:42 +00:00
f21c5c5cd8 Fix the symbolic of batchnorm to handle special case (#6967) 2018-04-25 17:04:25 -07:00
b038b3d7be Always dumping final meta.yaml for debugging (#6977) 2018-04-25 19:00:24 -05:00
3573f64bb1 [auto] Update onnx to 7ee2cf9 - merge the dummy backend back into the main one (#743)
7ee2cf9854
2018-04-25 23:44:01 +00:00
8028162103 Update the script to avoid the protobuf lib issue and add ZFNet (#6966) 2018-04-25 16:38:43 -07:00
94d2afbe50 Clarify _unsafe_view comment. (#6952)
It was unclear to me whether the "viewed" tensor was the input or the output.
2018-04-25 19:29:49 -04:00
2e32e8df75 Statically linking CUDA for Anaconda builds (#6680)
* Statically linking CUDA for Anaconda builds

* typo

* Adding a summary line

* Comments

* Typo fix

* Fix faulty parameter passing

* Removing problem CUDA modules for now

* Fixing unused debugging function

* Turning off static cuda linking until script changes are in

* Disabling mkl
2018-04-25 18:22:54 -05:00
7599d0c3fe [caffe2] ONNX backend support for control nodes (#6914) 2018-04-25 15:44:00 -07:00
3b009dffe1 Delete unused legacy indexed based streams (#6964)
PyTorch uses THC's THCStream API.
2018-04-25 18:38:47 -04:00
1e134b11ec [caffe2][cmake][opencl] Wrong directories were being included, which might break systems without opencl in the system headers (#6972) 2018-04-25 14:58:16 -07:00
5aed120bc3 [auto] Update onnx to 1c03a5a - [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551)
1c03a5a42e
2018-04-25 21:28:39 +00:00
a7b274bb2a Remove scratch space from THCState (#6956)
THC had a concept of per-device per-stream scratch space that was
persistent in THCState. This was useful before the caching allocator
because it avoided synchronizations in kernels that needed temporary
scratch space. However, it's not thread-safe since multiple threads can
operate on the same stream: In a two-pass reduction the scratch space
may get clobbered in between the two kernels.

This removes the scratch space and just uses THCudaMalloc and THCudaFree
within the reductions.

I've kept THCState_getCurrentDeviceScratchSpaceSize for now since it's
useful to have the temporary buffer be sized based on the number of SMs.
2018-04-25 16:02:17 -04:00
075ca76c26 [auto] Update onnx to 3769a98 - Rename real model test case from VGG-16 to ZFNet (#821)
3769a98362
2018-04-25 19:57:13 +00:00
333e8c9b22 any/all returns LongTensor, make test expect that (#6957) 2018-04-25 14:05:29 -04:00
6ebcb4606f fix typo in the LSTMCell math definition (#6951) 2018-04-25 19:20:46 +02:00
138d69c688 [auto] Update onnx to 403ccfb - Change the return type for the zipmap operator to match the description in the spec. (#818)
403ccfbd01
2018-04-25 15:48:39 +00:00
e767b186ee add missing UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD to TH_TENSOR_APPLY_REDUCTION_OMP (#6946) 2018-04-25 10:34:31 -04:00
e7babb1890 [aten] only lookup CuDNN if compiling with CUDA (#6905)
ATen can be configured to compile without CUDA support by passing
-DNO_CUDA=0 to cmake.  However, cmake will look for CuDNN independently
of that flag and may eventually find it.  In cases were compilation
without CUDA support was requested on system with CUDA installed, this
will result in linking errors while building some tests that rely only
on CuDNN being found.

Do not look for CuDNN if -DNO_CUDA=1 was provided in the cmake call
since it does not make sense to compile with CuDNN if CUDA support was
disabled.
2018-04-25 09:13:23 -04:00
2dc177ac50 Update checkpoint.py (#6943) 2018-04-25 08:43:58 -04:00
39d4814933 Make any and all on ByteTensor behave like sum/prod. (#4627) 2018-04-25 10:25:38 +02:00
241a1e0f52 [auto] Update onnx to 15289e3 - Tile - align with numpy (#757)
15289e3d77
2018-04-25 08:16:31 +00:00
c820fda180 [auto] Update onnx to 42207c6 - Pass to lift captured values as inputs to control nodes (#804)
42207c60d8
2018-04-25 08:15:37 +00:00
c92b5422f7 Fix typo in set_grad_enabled description (#6931)
After setting set_grad_enabled(False), y.requires_grad returns False. But in the example it is described as True.
2018-04-25 09:23:15 +02:00
e27d66a454 Remove Eigen from math CUDA and update algorithm in ReduceTensor and Moments (#6922) 2018-04-24 23:07:35 -07:00
40301c3be7 [auto] Update onnx to 15289e3 - Tile - align with numpy (#757)
15289e3d77
2018-04-25 06:05:47 +00:00
2f311be90b add default value to ConstantFill doc (#6923) 2018-04-24 20:57:09 -07:00
09f40ae06f silence compiler warnings (#6915) 2018-04-24 23:49:12 -04:00
d9bde84b84 Add threshold for ops using openmp macro (#5584)
* add threshold for ops using omp macro

* modify interface for ops using omp macro

* modify some thresholds

* implement C macros with optional parameters to avoid duplicating definitions for all pointwise operations

* add a parameter of LAB_IMPLEMENT_BASIC_FUNCTION for vectorizing

* modify the comment

* Revert "add a parameter of LAB_IMPLEMENT_BASIC_FUNCTION for vectorizing"
Modify macro LAB_IMPLEMENT_VECTORIZED_FUNCTION to enable optional parameters

This reverts commit 8ef783a0cc67b653c435e64a3beb6866a6b4216d.

Conflicts:
	aten/src/TH/generic/THTensorMath.c

* fix build error on windows

* retrigger the test
2018-04-24 23:41:55 -04:00
aa88ca8ae0 remove quotes from caffe2/contrib/aten/CMakeLists.txt (#6928) 2018-04-24 20:37:14 -07:00
dec5e99e99 [aten] Move submodules to third_party (#6866)
* [aten] Move submodules to third_party

* [aten] Update aten_mirror.sh script for third_party

* [aten] Move ATen submodules def to root and rename

* [aten] Update cpuinfo cmake build

* [aten] Fix cpuinfo cmake build

* Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1

* [aten] Fix JIT test reference to catch
2018-04-24 23:33:46 -04:00
c33d7f565b updated the environment collection script URL to the raw version on Github to download the script instead of the webpage (#6927) 2018-04-24 23:30:32 -04:00
8b70f7d248 [Caffe2] Clean up ideep integration (#6881)
* Clean up ideep integrtation

* .

* Remove redundant code in convnet benchmark

* MKL ON

* Do not add -mavx2 everywhere

* .

* Comments

* rename

* .
2018-04-24 18:32:35 -07:00
b7487d42a0 Workaround to make PythonOps traced with torch.jit.trace work correctly. (#6738)
The long-term fix is to remove the handling-creating pathways and
remove all the modes from PythonOp making it into an op that simply
calls a PyObject. Right now ONNX expects PythonOp to hold a
nn.Function, not a generic callable, so completely removing the legacy
pathway will also require changes to how ONNX symbolics are found.
2018-04-24 17:21:00 -07:00
e28508afa5 [auto] Update onnx to 42207c6 - Pass to lift captured values as inputs to control nodes (#804)
42207c60d8
2018-04-24 23:53:27 +00:00
3c80a2b85c [caffe2] Add flag to ONNXWhile to skip scoping (#6910)
* [caffe2] Fix logic error in tensor filling ops in C++ ONNX backend

* [caffe2] Add flag to ONNXWhile to skip scoping
2018-04-24 16:53:22 -07:00
53a8158d6d [auto] Update onnx to 0eaf45f - Add dtype for input in Gather node test case (#815)
0eaf45ff89
2018-04-24 22:23:24 +00:00
0b5910f77e [jit][script] Fix a bug combining sizes/unsized tensors (#6882)
* [jit][script] Fix a bug combining sizes/unsized tensors

This add an isSubtypeOf method to reflect that sized tensors are a subtype
of Dynamic[Tensors]. It updates the typechecking code to reflect this
relationship.

* Add index_select to shape prop
2018-04-24 14:04:18 -07:00
6e60edb799 [caffe2] Fix logic error in tensor filling ops in C++ ONNX backend (#6909) 2018-04-24 13:53:27 -07:00
146e8c8a10 Fix the legacy padding handling on global pool case (#6473) 2018-04-24 13:34:51 -07:00
cfb626b638 [caffe2][tiny][fix] Make the build work with profile observers (#6908) 2018-04-24 12:46:48 -07:00
9dd73aa7eb Fix stable link to always be /stable/ (#6907) 2018-04-24 15:42:46 -04:00
d985cf46f1 Add workaround to fix include warnings in Python 2 builds. (#6716) 2018-04-24 12:30:19 -07:00
90e75c6528 Speed up printing of large tensors. (#6876)
* Speed up printing of large tensors.

Instead of deciding on the format based on all of the elements of the tensor, decide based on the elements that will actually be printed.

* Fix flake8.

* Add else case.
2018-04-24 14:04:29 -04:00
0430bfe40b [docs] Update broadcasting and cuda semantics notes (#6904)
* [docs] Update broadcasting and cuda semantics notes

* Update multiprocessing.rst

* address comments

* Address comments
2018-04-24 13:41:24 -04:00
6418c49ee9 Make ArrayRef read-only by default. (#6444)
Sebastian Messmer noticed that these iterators were writeable by
default, which seemed dangerous.  Replaced with const iterators.
This doesn't seem to affect any ATen code; seems reasonable enough.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-24 13:30:43 -04:00
26c53c58a2 Fix ATen .travis.yml setup (#6860)
- ATen repo now has a new top-level, so Travis script has
  to be adjusted to (1) be moved to the top-level and (2)
  cd into the aten directory before doing anything.

- Unfortunately, this makes the import script even slower,
  because I'm banging on the entire index every commit.  If
  anyone has better suggestions for how to twiddle the index.
  One possibility is to fold the ATen build into the base\
  .travis.yml but only activate it when a file is missing
  (and then filter out that file.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-24 10:07:33 -04:00
21e0fc8fec [auto] Update onnx to adbfb4a - Fix the ConstantFill spec (#808)
adbfb4ad19
2018-04-24 03:55:51 +00:00
7d32f6fdc3 Adding runtime warning for checkpointing inputs to have requires_grad=True (#6883)
* Adding the warning for the checkpointing inputs to have requires_grad=True

* fix bug
2018-04-23 22:43:35 -04:00
9765bb5f1e Revert "Fix performance regression of simple indexing cases (#6793)" (#6886)
This reverts commit 8a016693c0808ec8353370fd4c48f4049a372b74.
2018-04-23 22:22:12 -04:00
b6ed729cdc fix memory leak in median (#6889) 2018-04-23 22:20:03 -04:00
df2817d3b1 Bump benchmark to master (#6878)
* Bump benchmark to master

* add semicolon to BENCHMARK_MAIN
2018-04-23 16:28:08 -07:00
82a33c32aa Update device docs (#6887)
Tell users that one can substitute torch.device with a string
2018-04-23 19:04:20 -04:00
b5d2d285a8 fix SVD backward on non-square matrices when some=False (#6870) 2018-04-23 19:01:51 -04:00
1ee009599c Add torch.get_default_dtype doc (#6872)
* add torch.get_default_dtype doc

* address comments
2018-04-23 18:58:01 -04:00
750a323ca1 Work around protobuf issues by importing onnx first (#6833) 2018-04-23 15:44:04 -07:00
aa56a1211d Update from facebook (#6871)
* Track checkpoint performance in scuba

As title.

* [C2/CUDA]: fix cross entropy sigmoid with logits

when adding log_d_trick, I forgot to add it to the cuda impl; this diff fixes
it.

* Back out "[caffe2] Unregister MKL fallbacks for NCHW conversions"

Original commit changeset: 8918dd40205a
Will land after @jongsoo's diff https://phabricator.intern.facebook.com/D7596315 lands

* [Easy][C2] Don't add blob to external outputs from output_record if it's already external output

As desc.

* On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization

FACEBOOK:

The QPL logger needs the initialization code. In the past, the initialization code is put in the pipeline calling Caffe2. However, those places become obsolete quickly, as the product teams change places to call Caffe2 from time to time. We also need to track which teams use Caffe2 so that we can put the initialization code there.

With this diff, the initialization code is put in the predictor constructor, only enabled for mobile phones. This way, we can always enable QPL logging.

Once we do this, we can check how many times Caffe2 inference is called in production, and which models are more popular in production. This way, we can prioritize our effort supporting those models.

Will clean up the old code calling the init in the product in a separate diff.

* add padding op for sparse length tensor

to pad length-based sparse tensor with padding_value

* Add conv_op with cudaconvnet engine

Add conv_op with cudaconvnet engine

* [numa] Fix simple NUMA copy benchmark

Move XavierFill into init_net and also compute BW

* call roundf (device function) instead of round (host function)

* [caffe2_benchmark][observer] Make caffe2_benchmark use its own observer

1. Add ClearGlobalNetObservers()
2. Make caffe2_benchmark use its own observer and observer_reporter

* [detectron] Use roundf instead of round in the detectron module ops

* allow K larger than number of elements in top k op

one use case is to use this op together with PackSegments for sparse tensors, where the number of elements in each slice is not statistically defined.

* add ChannelShuffle DNNLOWP op

* fixup math_cpu.cc break
2018-04-23 15:01:56 -07:00
aeb91587e5 [caffe2] Fix observer logic in RNN executor. Remove dynamic casts (#6202)
* Fix observer logic in RNN executor. Remove dynamic casts

* Revert to original design
2018-04-23 15:01:00 -07:00
548f6e34ab [caffe2][nomnigraph][fixup][tiny] Remove accidentally included logging (#6880) 2018-04-23 13:59:55 -07:00
9ed46c615c [Caffe2] Provide option to initialize the TensorRT engine at Operator constructor time (#6809)
* Try to have a lazy conversion of onnx-trt

* .

* Make it work

* comments
2018-04-23 13:09:35 -07:00
a2f2d6b43f Add special case for printing dtype for empty int64 tensor (#6869)
* add special case for printing dtype for empty int64 tensor

* add comment
2018-04-23 12:07:59 -07:00
a02b7c9776 Move main slice logic for easier reuse (#6822)
Want to reuse this logic for Int8 Slice.
2018-04-23 12:00:56 -07:00
b8ada7380a Tuple literal and cat support (#6691)
* Support list and tuple literals: Adds support for [a, b], (a, b) and "a, "

* Allow non-tensors to reach emitBuiltinCall, each SugaredValue::call
is now responsible for checking the types of its inputs.

Add support for calling cat with a tuple to emitBuiltinOp
2018-04-23 10:58:07 -07:00
90586d925f [DT] [38/n] Rename add_stop_signal to add_stop_condition (#6825)
att
2018-04-23 10:39:37 -07:00
a986b85afd [auto] Update onnx to 3cb4d61 - Extend optimizer passes to recursively descend on GraphProto attributes (#803)
3cb4d61387
2018-04-23 17:05:41 +00:00
46b1737255 [ONNX] Switch ONNX peephole optimizers to recursively descend on sub-blocks (#6828) 2018-04-23 10:01:03 -07:00
3b63be063e quick fix for collect_env (#6861) 2018-04-23 10:33:06 -04:00
4040164097 Relax collect_env.py tests (#6859)
This PR makes it so that the collect_env.py tests ignore the most minor
number of most version strings. It also bumps the version up to 0.5.0a
to fix the CI.
2018-04-23 10:28:41 -04:00
a4dbd37403 [doc] Minor fixes for Windows docs (#6853) 2018-04-23 13:15:33 +02:00
26ddefbda1 [feature request] [Caffe2] Enable MKLDNN support for inference (#6699)
* Add operators based-on IDEEP interfaces

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Enable IDEEP as a caffe2 device

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add test cases for IDEEP ops

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add IDEEP as a caffe2 submodule

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Skip test cases if no IDEEP support

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct cmake options for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add dependences on ideep libraries

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix issues in IDEEP conv ops and etc.

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Move ideep from caffe2/ideep to caffe2/contrib/ideep

Signed-off-by: Gu Jinghui <jinghui.gu@intel.com>

* Update IDEEP to fix cmake issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix cmake issue caused by USE_MKL option

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct comments in MKL cmake file

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-04-22 21:58:14 -07:00
a16b85facd [Caffe2] Fix cuda.cmake (#6821)
* Fix cmake

* .
2018-04-22 21:32:18 -07:00
e966f22656 fix typo (#6824) 2018-04-22 21:32:00 -07:00
e8bdbdaa27 Terminate dataloader workers properly when parent process is SIGKILL'ed (#6779)
Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue.

* Terminate dataloader workers properly when parent process is SIGKILL'ed

* Wait for worker processes to finish before shutting down manager process

* Add test for checking proper worker exit

* cosmetic change

* Test only if CUDA exists

* Don't call multiprocessing.set_start_method() in Python 2

* import TEST_CUDA only when we are in __main__

* Tune JOIN_TIMEOUT

* handle os.getppid() == 0 case

* Reset to original JOIN_TIMEOUT

* Use WaitForSingleObject() to check parent process status on Windows

* Fix TEST_CUDA import

* clean up

* Check main process only when index_queue.get() times out

* Change index_queues to multiprocessing.Queue

* Move manager checking logic to watchdog class

* Fix bugs in dataloader

* Fix TEST_CUDA import issue

* Don't import TEST_CUDA from common_nn

* Use event to signal manager exit in test

* fix lint

* Add comments
2018-04-22 23:03:54 -04:00
7a3c38ab59 Add environment collection script (#6635)
* Add environment collection script

Fixes #6111. This should make it easier for users to report bugs by giving
them a script to collect system environment information.

Changes include:
- Refactor out the environment collecting code from utils.bottleneck
- Add script (collect_env.py)
- Cleaned up the issues template so that it suggests using the script
  and is more readable.

Testing: added expect tests to go with 4 CI configurations. Whenever one
of these configurations gets updated, the test will fail until the test
also gets updated.

* Expect tests

* Update issue template

* Fix random space

* Minor improvement to issue template; fix expect test

* Skip expect test if BUILD_ENVIRONMENT not found; test fix; split off smoke/expect test
2018-04-22 15:18:14 -04:00
56567fe47d Add documents for Windows (#6653)
* Add Windows doc

* some minor fixes

* Fix typo

* more minor fixes

* Fixes on dataloader
2018-04-22 15:18:02 -04:00
7d5c9bff58 Removes (unused) LinearIndexCalcData. (#6791)
This class as well as several functions using it appear to not be used. This is simply code cleanup.

Testing:

All tests in test_cuda.py pass.
2018-04-22 13:58:22 -04:00
1c7b0c1020 Update version string to 0.5. (#6795) 2018-04-22 13:57:48 -04:00
50e92a3085 Static linkage for CUDA (#6807)
* add static linkage option for CUDA libs

* add CuFFT linking via fakelink

* remove warning for 5.0 cuda architecture
2018-04-22 13:57:17 -04:00
a8bdb561b7 Fix reductions on some contiguous tensors where size(dim) == 1 (#6815) 2018-04-22 13:55:55 -04:00
814f791f2b [JIT][script] Improve error reporting for tuple type mismatch (#6819)
Previously we would see errors like:

variable 'states' previously has type (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor) but is now being assigned to a value of type (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor):
since the default case in the diagnostic printout was "Tensor". This adds a virtual member function to each Type class that returns a human-readable string for better error reporting

* Improve error reporting for tuple type mismatch

* Add better Tensor printout
2018-04-22 13:54:52 -04:00
95d0e9aaa2 [docs] Update set_default_(tensor_|d)type docs (#6843)
* update set_default_(tensor_|d)type docs

* make ndarray display nicer
2018-04-22 13:44:20 -04:00
0d0dcde5a8 Fix caffe2 eigen + cuda9 windows build (#6746) 2018-04-22 09:36:09 -07:00
4e8e13d90c [auto] Update onnx to bf00ae6 - Kezhan/update ml op spec (#799)
bf00ae6118
2018-04-21 22:34:34 +00:00
d564ecb4a5 Update docs with new tensor repr (#6454)
* Update docs with new tensor repr

* remove cuda in dtype

* remove changes to gloo submodule

* [docs] document tensor.new_* ctor

* [docs] Add docs for tensor.to(), tensor.float(), etc

* [docs] Moar examples for docs.

* [docs] Warning for tensor ctor copy behavior

* Quick fix

* [docs] Document requires_grad_()

* [docs] Add example for requires_grad_()

* update slogdet and *fft

* update tensor rst

* small fixes

* update some docs

* additional doc changes

* update torch and tensor docs

* finish changing tensor docs

* fix flake8

* slogdet with negative det

* Update functional.py tensor ctors

* Fix nll_loss docs

* reorder to move device up

* torch.LongTensor -> torch.tensor or torch.empty in docs

* update tensor constructors in docs

* change tensor constructors

* change constructors

* change more Tensor() to tensor()

* Show requires_grads_ docs

* Fix set_default_dtype docs

* Update docs with new tensor repr

* remove cuda in dtype

* remove changes to gloo submodule

* [docs] document tensor.new_* ctor

* [docs] Add docs for tensor.to(), tensor.float(), etc

* [docs] Moar examples for docs.

* [docs] Warning for tensor ctor copy behavior

* Quick fix

* [docs] Document requires_grad_()

* [docs] Add example for requires_grad_()

* update slogdet and *fft

* update tensor rst

* small fixes

* update some docs

* additional doc changes

* update torch and tensor docs

* finish changing tensor docs

* fix flake8

* slogdet with negative det

* Update functional.py tensor ctors

* Fix nll_loss docs

* reorder to move device up

* torch.LongTensor -> torch.tensor or torch.empty in docs

* update tensor constructors in docs

* change tensor constructors

* change constructors

* change more Tensor() to tensor()

* Show requires_grads_ docs

* Fix set_default_dtype docs

* Link to torch.no_grad, etc, from torch doc

* Add dtype aliases to table

* regen docs again

* Tensor attributes stub page

* link to inplace sampling

* Link torch.dtype, device, and layout

* fix dots after nonfinite floats

* better layout docs
2018-04-21 07:35:37 -04:00
34fa355f27 [caffe2] Add Moments to math (#6798)
* Add gpu check for reduce_max

* Add Moments in math

* Update cpu version to avoid int type to be 0

* Update Moments on CPU to same as GPU
2018-04-21 01:03:44 -07:00
5945f3a7b4 [auto] Update onnx to e3da0f9 - Fix some checks not ideal to onnx-ml (#781)
e3da0f9bab
2018-04-21 03:28:57 +00:00
7b6b7d4575 Mark schema registration helper variables as unused (#6799) 2018-04-20 19:57:42 -07:00
8b28ab4858 Add option cache to speed up cmake build (#6737)
* Add option cache to speed up cmake build

* Also only run autogen_init_py_files once
2018-04-20 19:55:39 -07:00
34edd6f12e fix sparse tensor print (#6829) 2018-04-20 19:39:52 -07:00
8a434d9554 Print integral floating point numbers as X. instead of X.0000. (#6812) 2018-04-20 21:26:21 -04:00
8fc11748fe Fix debug build for Windows (#6758)
* Fix debug build for Windows

* Fix for wrong placement

* Fix variable name
2018-04-20 21:02:18 -04:00
a568b91a5d [docs] Add missing device parameters to factories, refer to dtypes as data types rather than types. (#6803) 2018-04-20 21:01:16 -04:00
516f067641 InputBuffers should AutoGPU for accumulation. (#6826) 2018-04-20 20:15:51 -04:00
6c8f0ef33b fixed error message (#6820) 2018-04-20 20:14:10 -04:00
9b37a4d027 [auto] Update onnx to 4890619 - Remove debug string (#798)
48906190e6
2018-04-20 23:39:11 +00:00
356af0c195 [auto] Update onnx to 2f7c284 - Use ONNX_NAMESPACE::to_string instead of std::to_string (#797)
2f7c284e57
2018-04-20 23:28:21 +00:00
afea133113 [auto] Update onnx to b20fae0 - Add newline at the end (#795)
b20fae0287
2018-04-20 23:24:08 +00:00
db540c9e7b Fix the bug in fb devgpu setup script (#6823)
* Update onnx_c2_setup.sh

* More fix
2018-04-20 15:15:41 -07:00
41bb1d56a7 [auto] Update onnx to f5496b2 - Update the remainig cases (#794)
f5496b2c74
2018-04-20 21:06:51 +00:00
02544f4472 [auto] Update onnx to 7d1e102 - change the inference context api to use TypeProto (#779)
7d1e102e73
2018-04-20 20:05:40 +00:00
1d51dd8665 [distributions] Fix Independent.rsample() and add more tests (#6806) 2018-04-20 21:55:39 +02:00
12e07ca731 [caffe2][nomnigraph] Add binary split algorithm to Algorithms.h (#6689) 2018-04-20 11:49:17 -07:00
a73b3fd1f0 [caffe2][opencl] Add OpenCL context (#6777) 2018-04-20 11:31:21 -07:00
8a15bc4c9c Fix the ONNX exporter API (#6788) 2018-04-20 09:10:38 -07:00
188b6e9346 [auto] Update onnx to 6953eff - some cleanups to shape inference impls (#771)
6953eff49a
2018-04-20 16:05:40 +00:00
c286efb442 Quick patch for the CI (#6802) 2018-04-20 08:58:38 -07:00
378f742792 [auto] Update onnx to 8dafe88 - Remove incorrect cases (#791)
8dafe88901
2018-04-20 15:36:16 +00:00
3e2891b27a Let Gloo close socket, destroy() not needed for non-NCCL backend (#6787) 2018-04-19 23:52:12 -07:00
ef76e24f60 [JIT][script][ONNX] ScriptModule ONNX export + ONNX export for control flow nodes (#6608)
* ScriptModule ONNX export

* ScriptModule ONNX export

* Export for control flow nodes

* Add pretty-print capability for ONNX export testing

* Update tests and handling of mutliple GraphProto names

* Maybe bugfix?

* factor out code from export and pretty print
2018-04-19 23:45:03 -07:00
945cb0fabc [auto] Update onnx to 45be0fe - Fix shadow-compatible-local compiler warning (#789)
45be0fe736
2018-04-20 05:02:50 +00:00
d695624efe More trt tests (#6782) 2018-04-19 21:53:49 -07:00
503be98d61 [auto] Update onnx to d01e4af - update the test cases (#788)
d01e4afc4e
2018-04-20 04:35:38 +00:00
c420297545 [jit][script] Constants python int now turn into Long (#6728)
This matches the behavior or literals.
2018-04-19 21:33:29 -07:00
7e1c5ca6d5 Add missing #include for CAFFE2_MODULE macro. (#6790) 2018-04-19 20:46:09 -07:00
8a016693c0 Fix performance regression of simple indexing cases (#6793)
* Fix performance regression on simple cases of indexing

Dispatches to the old kernels

* Adapt JIT test

The test was expected to fail, but due to the change in the previous diff, it would now dispatch to index_select, which succeeds. I modified the function to go through the advanced indexing codepath

* Only do checks once, properly AutoNoGil, AutoGPU.
2018-04-19 23:41:44 -04:00
c3bc927920 [auto] Update onnx to 7e1bed5 - Make proto_utils compatible for old version of protobuf (#787)
7e1bed51cc
2018-04-20 03:32:13 +00:00
a4ab83045d Fix cross device indexing for more than 1 cuda device. (#6781)
* Fix cross device indexing for more than 1 cuda device.

Cross device indexing is attempted from ATen, which doesn't work well because ATen doesn't have AutoGPU, etc.
Instead, before dispatching to ATen we do type conversion on the indices; it would probably be better if we
pushed all this down to ATen, but that will take some work.

* Small cleanup.
2018-04-19 22:03:25 -04:00
1a53e45558 [auto] Update onnx to abe285e - Fix unused parameter warnings (#786)
abe285e987
2018-04-19 23:49:32 +00:00
d1a992a85e Disallow chunks that are <= in torch.chunk (#6761)
Fixes #6759.

Before, `tensor.chunk(0)` would cause a divide by 0.
`tensor.chunk(-1)` would throw an error complaining that "split_size
needs to be positive".

This PR changes it so that the error message makes it clear that
`chunks` has to be greater than 0.
2018-04-19 18:31:14 -04:00
264ffd143c [auto] Update onnx to 5ef9c6e - Parallel windows build (#784)
5ef9c6ee28
2018-04-19 22:27:53 +00:00
6a41e2dc47 Add BC mechanism to Module.load_state_dict (#6639)
* Add version counter to module, change load_state_dict to use load_local_state_dict which does class specific loading

* Clarifies version number in docs

* fix jit tests

* fix state_dict tests

* typo

* fix ddp

* exclude version numbers from state dict entries

* Fix jit test and empty modules

* address comments

* test for "."

* revert the private version change in state_dict

* make IN case a hard error

* fix not reporting error when unexpected submodule

* address comments

* disallow empty string in name and remvoe trailing dot
2018-04-19 15:36:30 -04:00
6fed2341e9 [auto] Update onnx to 3b27cc8 - Try using pep518 to install the protobuf build dependency (#782)
3b27cc8faa
2018-04-19 19:25:37 +00:00
370acdf3bf Change to use CAFFE2_HOME for specifiying caffe2 models path (#6775) 2018-04-19 11:34:52 -07:00
a3f3817fbd [jit][script] Allow variables to be define in if statements (#6675)
We allow variables defined inside of if statements to be defined after
if statements as long as they will be defined unconditionally. This
supports a larger subset of python programs than we supported before.
2018-04-19 11:32:31 -07:00
4c5b95a433 Revert "Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606)" (#6772)
This reverts commit 8d6a50aaeba2166ce870016da7488f879395ebb1.
2018-04-19 14:28:48 -04:00
6dfaa1071a Check in ATen mirror script. (#6762)
Fixes #6556.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-19 11:12:16 -07:00
e169465672 rnn: A note on zero defaults for recurrent cells (#6719)
* Add a note on zero default for recurrent cells

* Fixes #434

Signed-off-by: mr.Shu <mr@shu.io>
2018-04-19 13:54:07 -04:00
d0b0edf27a Add a requires_grad_() function to tensors. (#6771) 2018-04-19 13:47:24 -04:00
f6da2fd944 Make the variable closer to usage (#6752)
`chain` is used for the loop below.
2018-04-19 10:43:05 -07:00
2acc247517 [docs] Update autograd notes (#6769) 2018-04-19 13:34:14 -04:00
de9bdf1d31 Module.to doc udpate and example format update (#6774) 2018-04-19 13:30:40 -04:00
47bd4be4d3 [docs] More factory functions (#6709)
* More factory functions

Changes:
- Added the remaining factory and factory-like functions
- Better argument reuse via string templates
- Link under torch.rst's Creation Ops to the randomized creation ops

* Add double tick around False

* fix flake8

* Fix False

* Clarify comment: hopefully it is clearer now
2018-04-19 13:16:07 -04:00
cc3284cad3 [docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763) 2018-04-19 13:15:27 -04:00
7f587de4bc [Caffe2] Let TensorRT flow use the generic graph transformer (#6696)
* Refine the transform API

* Let TensorRT flow use the generic graph transformer

* Rebase
2018-04-19 10:07:01 -07:00
9c682f02b7 [docs] Fix some sphinx warnings (#6764)
These aren't important but too many of them can obscure real warnings
with the docs.
2018-04-19 12:37:42 -04:00
e44f901b55 added functionality for state_dict/load_state_dict for lr_scheduler ( Fixes: #3026 ) (#6342)
* added functionality for state_dict/load_state_dict for lr_scheduler

* fixed linting issues/removed unused import

* refactor lr_scheduler state_dicts/state_dict holds everything __dict__ but optimizer

* changed documentation in lr_scheduler

* Update lr_scheduler.py
2018-04-19 07:09:03 -04:00
072d49f787 Fix import error sometimes happening in dataloader when exiting Python (#6671)
* Fix import error sometimes happening in dataloader when exiting Python

* address comments
2018-04-19 06:56:39 -04:00
533beab5bb Fix doc for torch.nn.functional.relu (fixes #6742) (#6749)
Thank you Shengyi Qian (JasonQSY) for spotting and reporting.
2018-04-19 11:25:43 +02:00
71c644b005 [caffe2] Add ReduceMinOp and ReduceMaxOp (#6744)
* Add gpu check for reduce_max

* Add ReduceMinOp and ReduceMaxOp

* Merge util functions in reduce_ops and math

* Expose math internal functions
2018-04-19 00:22:23 -07:00
fff80c2c1f [auto] Update onnx to 1439eab - Fix Protobuf error message in CI (#776)
1439eab554
2018-04-19 04:53:09 +00:00
e1f5d80d5c Eliminate handle_zero_dim when broadcasting is applied earlier. (#6683)
* Eliminate handle_zero_dim when broadcasting is applied earlier.

This ends up not actually doing anything unless all the broadcasted tensors are scalars,
which ends up with inconsistent behavior in that case only, because the type promotion rules are different.

This is better solved with real type promotion logic.

* Change type of script comparison to long.

* Fix jit tests.

* Fix cpp jit test by being consistent about long-vs-float.

* Consistent float and long.

* Use int64_t rather than long.
2018-04-18 23:37:54 -04:00
9c47eb5548 Fixes test_torch.py so that all tests pass on Volta hardware. (#6736)
Issue: "python3 test_cuda.py" currently results in a failure when using Volta hardware.

The failure is in test_advancedindex, and is caused by two "sub-tests." At line 4651 a series of indices are used to compare PyTorch's and Numpy's indexing behavior. At least two of these indices index the same element of the reference tensor multiple times. These are:

[slice(None), [[2]], [[0, 3], [4, 4]]]
[slice(None), [[0, 1], [1, 0]], [[2, 3], [3, 0]]]

The first index selects the 5th element of the third row twice, and the
second index selects the 4th element of the second row twice.

This causes the test to attempt to update the same index with two distinct values simultaneously. On my machine the Numpy created tensor will always take the "latter" of these two values, while the Volta tensor will always take the "former." (Not to say this behavior is guaranteed by either framework.)

The fix is to remove these two indices from test_torch.py. This causes all tests to pass.

While updating test_torch.py I also noticed that assert_get_eq(tensor, indexer) had a bug where it was referring to "reference" instead of "tensor." This bug had no impact on behavior. The fix is to have this function refer to its input tensor, "tensor," instead. All tests still pass after this fix.
2018-04-18 22:44:14 -04:00
11c1af8dbc [docs] add docs for tensor.view_as (#6730) 2018-04-18 22:43:45 -04:00
bacda6df8d Better error message for gels on CUDA (#6726) 2018-04-18 22:43:30 -04:00
75ccfb321b Fix cpp_extensins.py (#6722) 2018-04-18 22:43:12 -04:00
4d2a0b889f [Caffe2] Use mapped workspace instead of renaming when working on renamed nets (#6717)
* Use mapped workspace instead of renaming when working on renamed nets

* Comments
2018-04-18 19:14:11 -07:00
c40eefeef9 ChannelShuffle with NHWC layout (#6667)
* ChannelShuffle with NHWC layout

* ChannelShuffle with NHWC layout
2018-04-18 19:13:45 -07:00
d26ab68485 Sort declarations when generating Python bindings (#6701)
* Sort declarations when generating Python bindings

This helps resolve ambiguities in argument parsing according to
any rules we will need.

For now, this allows us to make scalar operations more conservarive
wrt. argument types, but makes them commutative again.

* Fix inconsistencies between mod with tensor and scalar

* Fix a stupid mistake
2018-04-18 21:51:35 -04:00
e47b3018b7 [caffe2] Update EigenTensorMap to use ColMajor (#6735)
* Add gpu check for reduce_max

* Update EigenTensorMap to use ColMajor

* Revert incorrect change on cpu
2018-04-18 18:28:38 -07:00
d1bb75e273 Redo tensor repr to make it less verbose (#6370)
* Redo tensor repr to make it less verbose

* fix empty tensor

* fix scaled scalars

* update for device-dtype split

* address comments

* removed repeated lines

* address comments

* add cuda to device string
2018-04-18 18:25:07 -07:00
8d6a50aaeb Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606)
* Terminate dataloader workers properly when parent process is SIGKILL'ed

* Wait for worker processes to finish before shutting down manager process

* Add test for checking proper worker exit

* cosmetic change

* Test only if CUDA exists

* Don't call multiprocessing.set_start_method() in Python 2

* import TEST_CUDA only when we are in __main__

* Tune JOIN_TIMEOUT

* handle os.getppid() == 0 case

* Reset to original JOIN_TIMEOUT

* Use WaitForSingleObject() to check parent process status on Windows

* Fix TEST_CUDA import

* clean up

* Check main process only when index_queue.get() times out

* Change index_queues to multiprocessing.Queue

* Move manager checking logic to watchdog class

* Fix bugs in dataloader

* Fix TEST_CUDA import issue
2018-04-18 20:41:33 -04:00
354dac9769 updates module.to doc for the new tensor.to(requires_grad) (#6733) 2018-04-18 18:42:15 -04:00
198be34de6 [docs] Add back deleted tensor.cuda() method (#6732) 2018-04-18 18:20:09 -04:00
6ae060b2b1 Roll forward Eigen to 5a0ab9f to solve the compilation problem with CUDA 9.1 (#6710) 2018-04-18 15:17:06 -07:00
c14b62fca2 Create FileBaton to synchronize distributed JIT C++ extension builds (#6684)
* Create FileBaton to synchronize distributed JIT C++ extension builds

* Move FileBaton to its own file

* Autoformat code

* Respect verbose flag in cpp_extension._prepare_ldflags
2018-04-18 18:07:03 -04:00
789e0e066a [auto] Update onnx to 1756f61 - Align schema function signatures in python and c++ (#775)
1756f6183d
2018-04-18 22:01:16 +00:00
38614c4670 Add gpu check for reduce_max (#6729) 2018-04-18 14:51:52 -07:00
61a69c2492 [caffe2] Use both __ARM_NEON__ and __ARM_NEON macros (#6697)
ARM64 clang from Android NDK doesn't define __ARM_NEON__, which results is perf regression on some models. I figured that some compilers define __ARM_NEON__ while others define __ARM_NEON. This patch changes all NEON-specific parts in Caffe2 to check both macros.
2018-04-18 17:45:47 -04:00
ad8bfb7359 Adding package name parameter for conda builds (#6727) 2018-04-18 14:02:09 -07:00
2d09799950 [docs] Document CUDA profiling gatchas in bottleneck docs (#6715) 2018-04-18 16:55:13 -04:00
969251962c [Caffe2] Enhance test for CollectAndDistributeOp (#6693)
* Caffe2: Enhance test for CollectAndDistributeOp

This also changes the operator and the test to use stable sort
otherwise the test will fail due to differences between the op
and the test when facing ROIs of the same score.

* Caffe2: Adjust comparator to make std::nth_element and std::sort stable

Revert the removal of std::nth_element and std::sort and adding of
std::stable_sort.
2018-04-18 13:19:05 -07:00
e089849b4a Add mutex to THC random number generator (#6527)
* Add mutex to THC random number generator

* Add test for CUDA RNG multithread

* fix lint

* Rename gen_state to state and remove unnecessary mutex lock

* Remove RNG test from cpp_extensions

* Add CUDA RNG test to libtorch

* Build test_rng only if CUDA exists

* Move test to aten/src/ATen/test/

* Separate ATen build and test, and run ATen test in CI test phase

* Don't test ATen in ASAN build

* Fix bug in ATen scalar_test

* Fix bug in ATen native_test

* Add FIXME to some CUDA tests in scalar_tensor_test

* Valgrind doesn't work well with CUDA, seed the CPU and CUDA RNG separately instead
2018-04-18 15:54:13 -04:00
c25f097225 [wip] Fixing ci conda tests (#6686)
* Disabling ttsvd test conda

* Just skipping all tt_svd tests
2018-04-18 12:47:13 -07:00
96e2140ffb Check for g++ also in check_compiler_ABI (#6711)
Otherwise a spurious warning is generated. @goldsborough
2018-04-18 20:30:52 +01:00
530e1e89f0 [auto] Update onnx to 5509f70 - more shape inference implementations (#758)
5509f70b80
2018-04-18 18:20:13 +00:00
be0b7f8c81 Add reduce min and reduce max (#6685) 2018-04-18 10:58:05 -07:00
6c3e5af393 [auto] Update onnx to ac948de - add eliminate identity optimizer (#755)
ac948de61d
2018-04-18 17:54:07 +00:00
63d42408d0 [Caffe2] Detectron fpn support (#6645)
* [Caffe2] Update collect_and_distribe op to fit arbitrary size

* [Caffe2] batch_permutation CPU implementation

* Make requested changes
2018-04-18 10:00:49 -07:00
a1cc8dde80 Fix LSTM and GRU parameters description (#6665)
* Fix LSTM and GRU parameters description

* Fix previous layer time to t-1 as reviewed

* Replace 'the first layer' to 'at time 0' per review suggestion
2018-04-18 12:05:25 -04:00
Ace
2a628ba32f Update README.md (#6703) 2018-04-18 09:44:08 -04:00
bd0cc7d364 Implement torch.einsum (fixes #1889) (#6307)
* start at generic trilinear

* Implement einsum (fixes #1889)

This provides a simple implementation of einsum. It is built on
top of the work for computing bilinear (#6110).
It uses a naive left-to-right resolution at the moment.
Autograd is able to differentiate by itself.
The obvious unsupported feature is taking diagonals (einsum('ii->i',(a,)).

* add tests and docs

* fix flake8

* clean diff

* rebase on current master to resolve conflicting String wrapping

* clean up after rebase

* better commentary in einsum and sumproduct_pair

* don't say fixme if it's fixed and rename num_outputs to num_output_dims

* adapt python wrapper to use std::string instead of String to avoid typedef at::String

* typos and some vector to array conversion

* fix accidental python<->python3 change

* really fix bad rebase
2018-04-18 13:41:27 +02:00
187955b959 [distributions] Skip validation of lazy properties (#6666) 2018-04-18 10:12:08 +02:00
fb7bd8e4ae Better dispatch (#6687) 2018-04-18 08:38:47 +02:00
6223bfdb1d Update from Facebook (#6692)
* [GanH][Easy]: Add assertion to adaptive weighting layer

0 weight causes numeric instability and exploding ne

* [Easy] Add cast op before computing norm in diagnose options

As LpNorm only takes floats we add a manual casting here.

* Introduce a new caching device allocator

`cudaMalloc` and `cudaFree` calls are slow, and become slower the
more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock
because GPU memory is transparently shared across all GPUs. Normally, this
isn't much of a concern since workloads allocate memory upfront, and reuse it
during later computation.

However, under some computation models (specifically, memory conserving
approaches like checkpoint-and-recompute, see
https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9)
this assumption is no longer true. In these situations, `cudaMalloc` and
`cudaFree` are common and frequent. Furthermore, in data parallel contexts,
these calls happen at nearly the same time from all GPUs worsening lock
contention.

A common solution to this problem is to add a custom allocator. In fact,
nVIDIA provides one out of the box: CUB, which Caffe2 already supports.
Unfortunately, the CUB allocator suffers from very high fragmentation. This is
primarily because it is a "buddy" allocator which neither splits nor merges
free cached blocks. Study
https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you
want to convince yourself.

This diff adapts a caching allocator from the Torch codebase
https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp
which does splitting and merging and ends up working really well, at least for
workloads like the checkpoint-and-recompute computation models noted above.

I simplified the implementation a little bit, made it a bit more C++-like. I
also removed a bunch of stream synchronization primitives for this diff. I
plan to add them back in subsequent diffs.

* Report reader progress in fblearner workflows

Integrate with fblearner progress reporting API and add support to report training progress from reader nodes.
If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split.
If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate.

* [GanH][Diagnose]: fix plotting

1. ganh diagnose needs to set plot options
2. modifier's blob name is used for metric field can need to be fixed before
generating net

* Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8

* Make CompositeReader stops as soon as one reader finishes

Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data.

* [dper] make sure loss is not nan

as desc.

* [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign

Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more
optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but
will soon become important.

* Intra-op parallel FC operator

Intra-op parallel FC operator

* [C2 Proto] extra info in device option

passing extra information in device option

design doc: https://fb.quip.com/yAiuAXkRXZGx

* Unregister MKL fallbacks for NCHW conversions

* Tracing for more executors

Modified Tracer to work with other executors and add more tracing

* Remove ShiftActivationDevices()

* Check for blob entry iff it is present

When processing the placeholders ops, ignore if the blob is not present in the blob_to_device.

* Internalize use of eigen tensor

Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries.

* feature importance for transformed features.

* - Fix unused parameter warnings

The changes in this diff comments out unused parameters.
This will allow us to enable -Wunused-parameter as error.

#accept2ship

* add opencv dependencies to caffe2

The video input op requires additional opencv packages. This is to add them to
cmake so that it can build

* Add clip_by_value option in gradient clipping

Add clip_by_value option in gradient clipping

when the value is bigger than max or smaller than min, do the clip

* std::round compat
2018-04-17 23:36:40 -07:00
eca0ef5e42 __STDC_FORMAT_MACROS was conflicting with some thirdparty include from google perf tools. Looks like a harmless fix (#6676) 2018-04-17 22:33:37 -07:00
6252706feb [Caffe2] Workspace centric API for TensorRT transformation (#6678)
* Workspace centric API for trt transformation

* Merge SSA rewrite code
2018-04-17 21:23:27 -07:00
dc94182db0 Check for --noprefix option for mpiexec in run_test.py (#6690)
* Check for --noprefix option for mpiexec

--noprefix option to mpiexec is not part of the MPI standard.
It is needed in certain configurations when using OpenMPI but not
supported with other MPI implementations such as MPICH and maybe
others. This commit adds a check if the option is supported by
the current mpiexec. Also this commit fixes Issue #4965 and MPI
tests can be enabled in the CI.

Fixes: #4965

* Update run_test.py
2018-04-17 23:34:33 -04:00
1c01eabd3c Codemod to update our codebase to 0.4 standard (#6641)
* Codemod to update our codebase to 0.4 standard

* Update some of the test scri[ts

* remove Variable in test_clip_grad_value

* fix _symbolic_override_wrapper_maker
2018-04-17 22:06:54 -04:00
c43c911662 Export onnx protobuf bindings to python (#6651)
* Export onnx protobuf bindings to python

* rename native onnx module to _onnx
2018-04-17 16:38:57 -07:00
f50f1769ec [auto] Update onnx to 844bbc2 - Update Python schema API to take domain (#764)
844bbc2142
2018-04-17 23:33:10 +00:00
711343f981 Gltensor fix (#6647)
Fix getGLTensor
2018-04-17 16:25:38 -07:00
4dd29ac89f fix broken code from rebasing (#6681) 2018-04-17 15:44:56 -07:00
1191627008 Make torch.backends.mkl.is_available() work without importing (#6677) 2018-04-17 18:10:32 -04:00
f15f3ca1af Scope variables inside the dataloader (#6673)
* Scope variables inside the dataloader

This clears up the memory consumed by batches inside the dataloader. Its pretty useful for long living data loaders.

* Update dataloader.py
2018-04-17 17:48:12 -04:00
a86f53fbf1 Fix padding and output_padding in ConvTranspose docs (#6679) 2018-04-17 17:36:19 -04:00
8cf41b40e6 Update gitignore so that third_party/build and aten/src/ATen/Config.h are cleaned properly. (#6672) 2018-04-17 17:27:35 -04:00
459dfdc304 [Caffe2] C++ SSA Rewrite of Caffe2 nets (#6531)
* Netdef SSA rewrite

* unit test
2018-04-17 14:24:13 -07:00
7de61c3b8c Update tensors.rst Tensor introduction (#6670)
Changes:
- Deleted docs for old constructor. Add link to new `torch.tensor` ctor
- Add docs for `torch.tensor`
- Add some info on dtypes to the top of `tensors.rst`.
2018-04-17 16:52:22 -04:00
4be34ca0f3 Add broadcast and reduce gradient (#6668)
Add broadcast and reduce gradient
2018-04-17 13:31:13 -07:00
e51e792cef enable exporting bidirectional rnn with fixes seq len from onnx to caffe2 (#6566) 2018-04-17 12:27:16 -07:00
f656301526 Allow traces to call @script functions (#6642)
This adds the ability to trace script functions while preserving their
control flow. When the trace encounters a script function it inlines
the graph of the function into the trace rather than tracing the
function itself.
2018-04-17 15:19:16 -04:00
1f2829dd2a Update tensor factory method docs (#6640)
* Update tensor factory method docs

Also add new docs for `torch.empty`.

* Add full; some refactoring to make docs nicer
2018-04-17 14:30:46 -04:00
d193f82c1d Adding dispatch to Tensors (#6664)
It solves the problem of chaining externally defined functions.
2018-04-17 14:13:29 -04:00
2aaa9ae60f [auto] Update onnx to 54ca9cb - The content of a string is doubled if it's a string tensor (#765)
54ca9cb503
2018-04-17 17:28:30 +00:00
feb8522f99 randperm supports n=0 (#6656)
This makes it compatible with arange and numpy.random.permutation
2018-04-17 19:03:57 +02:00
7fcaf3b49e Update torch.nn.init and torch.nn.utils.clip_grad (#6173)
Introducing two updates.

1. Add param to He initialization scheme in torch.nn.init
Problem solved:
The function calculate_gain can take an argument to specify the type of non-linearity used. However, it wasn't possible to pass this argument directly to the He / Kaiming weight initialization function.

2. Add util to clip gradient value in torch.nn.utils.clip_grad
Problem solved:
DL libraries typically provide users with easy access to functions for clipping the gradients both using the norm and a fixed value. However, the utils clip_grad.py only had a function to clip the gradient norm.

* add param to He initialization scheme in torch.nn.init

* add util to clip gradient value in torch/nn/utils/clip_grad.py

* update doc in torch.nn.utils.clip_grad

* update and add test for torch.nn.utils.clip_grad

* update function signature in torch.nn.utils.clip_grad to match suffix_ convention

* ensure backward compatibility in torch.nn.utils.clip_grad

* remove DeprecationWarning in torch.nn.utils.clip_grad

* extend test and implementation of torch.nn.utils.clip_grad

* update test and implementation torch.nn.utils.clip_grad
2018-04-17 11:32:32 -04:00
1e34493825 Fix some loss output sizes (#6659) 2018-04-17 10:59:12 -04:00
d5f041aa8b Updated documentation for cross entropy loss to include multi-dimensional input shapes (#6638) 2018-04-17 09:56:43 -04:00
c77fca570c Add device docs; match constructor parameter names with attribute names. (#6633)
* Add device docs; match constructor parameter names with attribute names.

* Use double quotes for strings.

* Update printing.

* Separate device ordinal-only construction into a separate note.

* Use current device.
2018-04-17 09:55:44 -04:00
30849eb668 Bind 0-dim variables without requires grad to int64/double similar to how we do with Scalar. (#6637)
Note:
- Only integral scalar types bind to int64
- Both integral and floating point scalar types bind to double (same rules as python numbers).
2018-04-17 09:54:49 -04:00
639dd0e324 Fix an error in the tensor docs. (#6658)
The docs incorrectly stated that there was seven CPU tensor types and
eight GPU tensor types, before listing eight types for both CPU and GPU.
2018-04-17 09:54:19 -04:00
f2c9975378 Add DistributedDataParallelCPU (#5919) 2018-04-17 15:36:47 +02:00
c345212c86 Support gpu triangle solve (#6648)
* add cuda trtrs

* remove queue

* add test trtrs
2018-04-17 14:33:39 +02:00
b34ae77be8 always compute gradients for the gradcheck inputs (#6654) 2018-04-17 14:23:59 +02:00
bc6243cb4a Explicitly define all caffe2 reducer ops by name (#6513)
* Explicitly define all caffe2 reducer ops by name instead of string concatenating them

Explicitly define all caffe2 reducer ops by name instead of string concatenating them.

* Use recursion to make the equal() function compatible with C++11.

* Trivial change.

* Trivial change.

* Trivial change to force the flaky build system to rebuild.

* Trivial change to force the flaky build system to rebuild.

* Trivial change to force the flaky build system to rebuild.

* Trivial change to force the flaky build system to rebuild.

* Trivial change to force the flaky build system to rebuild.

* Addressed @dzhulgakov's comments.

* Addressed @dzhulgakov's comments.

* Trivial change to force the flaky build system to rebuild.

* Trivial change to force the flaky build system to rebuild.
2018-04-17 00:40:58 -07:00
e46043ab0c Fixed NCCL build in fbcode (#6643) 2018-04-16 23:53:56 -04:00
5ed3f3347a Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. (#6573)
* Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod.

This adds optional dtypes to torch.sum, torch.prod, torch.cumsum, torch.cumprod.
By default, the dtype is torch.float64 for integral types, and the dtype of the input for floating point types.

* Don't use optional<ScalarType>, because the jit can't handle it yet.

Instead, we manually build the overloads.  This is fairly painful because of default arguments, but should be easy to pull out once the jit can handle optional<ScalarType>.

* Fix keepdim with out parameters.

* Fix _cudnn_rnn_flatten_weight.

* If dtype is provided to an out function, make sure it matches the dtype of the result.

* Fix typo.
2018-04-16 23:52:59 -04:00
dd91d57c3f Update docs for torch.zeros factory method (#6594)
* Update docs for torch.zeros factory method

If this looks good, I'll submit another PR rewriting the other factory
methods in this fashion.

* Address comments

* Better explanation for device default

* Add variable argument back

* s/set/sequence/g

* Remove class from torch.strided
2018-04-16 18:28:12 -04:00
ee240aa00c Allow script_methods to be defined out of order (#6341)
This modifies the registration process so that all script methods
in a ScriptModule are defined at once.

Method gains a `method_creator` callback that gets invoked when the
method is first called to define it if it has not already been defined.
Recursive cycles in this `method_creator` are checked.

This approach was chosen over first creating all the graphs and then
inlining the call sites because it will combine better with type
propagation for non-tensor types like tuples. e.g.

```
a = foo(b)
return bar(*a)
```
2018-04-16 15:19:05 -07:00
0e93a2c334 Add Module.to (#6629) 2018-04-16 17:46:52 -04:00
3e83e3abfe Adding initial_accumulator_value parameter to Adagrad (#6616) 2018-04-16 22:12:36 +02:00
53d2612b55 Fix a typo in the setup.py script (#6632) 2018-04-16 15:29:45 -04:00
582d47e986 [Caffe2] Scoped dummy name generator (#6458)
* Scoped dummy name generator

* Fix

* Fix

* Use class variable

* Fix build

* comment
2018-04-16 11:58:02 -07:00
ce2854c875 Create safe and unsafe versions of sparse_coo_tensor (#6058)
Fixes #5748.

Added an unsafe version so embedding isn't slowed.

* Create safe and unsafe versions of sparse_coo_tensor

* rename sparse_coo_tensor_unsafe to _sparse_coo_tensor_unsafe

* refactor

* make helper static inline

* add sparse size check test

* fix lint
2018-04-16 14:42:57 -04:00
40592f91b5 Fix bilinear performance regression (#6110)
The current implementation of bilinar uses a matrix multiplication approach. This creates a large intermediate matrix (batch * output dimension * input dimension). Relative to the previous pure python approach, this caused severe performance regression (600ms vs. 18ms for 300x100x200 weights and a batch of 50 on CPU, and also quadratic memory).
The attached change restores the performance using the previous strategy of looping over output features. It implements forward, backward, and double backward as native ATen code.

Credits:

Martin Tutek reported the regression and pinpointed the problem
Adam Paszke patiently answered my questions about ATen
I would not have been able to prepare this without you, thank you!

I referenced the old python implementation, used a python version of the naive implementation, and coded manual functions etc.

The tests have gradgradcheck etc.

* fix memory use of native bilinear

* bilinear double backward

* Move bilinear_double_backward to Functions.cpp

Addresses review comment by Tongzhou Wang. Thank you!

* add WrapDimUtilsMulti.h

* start at generic trilinear

* move to generic trilinear

* catch up on dim_list_to_bitset

* switch bilinear to use _trilinear implement _trilinear_backward

* add comments to Linear.cpp, move _trilinear in yaml
2018-04-16 14:41:47 -04:00
24b4931462 Improve run_test.py to support running individual test classes and methods (#6344)
* Improve run_test.py to support running individual test classes and methods

Added support in run_test.py for running individual test classes and methods.
The -i/--include option can specify a list of test modules, classes or methods
like this:

python run_test.py -i autograd torch.TestTorch.test_abs \
  torch.TestTorch.test_add utils.TestBottleneck

-f, -l and -x behaviour stays the same as before

* Fixed some code formatting

* Multiple fixes according to the reviews in #6344
2018-04-16 14:33:50 -04:00
4d0097fab8 Note that the Docker Hub image is not up-to-date. (#6434)
Fixes #6397.
2018-04-16 14:31:34 -04:00
7ef14bf04c Follow the change of ONNX Cast operator "to" attribute (#6574)
* Follow the change of ONNX Cast operator "to" attribute

* Update Cast conversion in frontend and backend

* update pytorch onnx frontend
2018-04-16 14:24:42 -04:00
30157971f0 Update dist test to use multi gpus (#6337)
* update dist test to use multi gpus

* add nccl to jenkins

* address comment

* make lint happy

* convert range object to list
2018-04-16 14:10:27 -04:00
892be8b779 Make dtype in .to positional rather than kwarg only (#6628) 2018-04-16 14:03:40 -04:00
04fae73323 [auto] Update onnx to bf42662 - Change the "to" attribute of Cast operator to of type int (#727)
bf42662637
2018-04-16 17:54:50 +00:00
d7cb78478f Split set_default_tensor_type(dtype) into set_default_dtype(dtype). (#6599)
* Split set_default_tensor_type(dtype) into set_default_dtype(dtype).

* Fix flake8.

The difference between this one and set_default_tensor_type is that it only sets scalar type what determines the type + device of a tensor returned from a factory function with defaults is the default tensor type + the current device (if the default tensor type is cuda). This just changes the scalar type of the default tensor type.

We do eventually want to deprecate set_default_tensor_type; it is not clear how to do that in a sensible and backwards compatible way.
2018-04-16 13:49:00 -04:00
76ca037069 [distributions] Implement Independent distribution (#6615)
* Implement Independent distribution

* Add docs for Independent distribution
2018-04-16 11:42:12 -04:00
fd6d11ae66 Fixed text of error message in case of unexpected target size (#6617) 2018-04-16 11:27:02 -04:00
46374ad5c8 Add tensor.to(device) method. (#6588)
* Add tensor.on(device) and tensor.on_device_as(tensor) methods.

* Rename {'on', 'on_device_as'} -> 'to'.

* Fix test ordinal.

* Fix device ordinal again.
2018-04-16 10:50:34 -04:00
084e3a755b fix incorrect path (#6605) 2018-04-15 21:22:11 -07:00
2ef23b6241 [caffe2] Update transpose with compile time dimension (#6614)
* Update transpose with compile time dimension

* Change return to break
2018-04-15 19:20:39 -07:00
f5beff334b Added distributed docs on NCCL2 backend/functions and launch module (#6579) 2018-04-15 21:53:10 -04:00
5463a4a319 Fix typo. (#6609) 2018-04-15 11:43:10 +02:00
8aff844f2d [JIT] torch::jit::Type needs a virtual destructor (#6611) 2018-04-15 11:41:34 +02:00
cd2112717c [caffe2] Update math functions with params on host. (#6602)
* Update ReduceMean

Add reduce mean to math

Add reduce mean to math

* sync reduce_ops_test

* Update math_gpu.cu
2018-04-14 21:41:41 -07:00
caadc9301f [auto] Update onnx to ff7b3b4 - enable warning check and fix warnings. (#760)
ff7b3b4c85
2018-04-14 22:27:55 +00:00
0e246305ab [auto] Update onnx to 97d3ae6 - Kezhan/update size op output type (#759)
97d3ae6ddd
2018-04-14 01:40:58 +00:00
eaf1e4b6ab Docs for torch.*_like(...) factory functions (#6589)
* Docs for torch.*_like(...) factory functions

In the same spirit as `torch.randn_like`.

* Address comments

* Recommend ones/zeros with out keyword
2018-04-13 20:49:35 -04:00
e8d2f05931 [JIT] Switch JIT passes to take a graph rather than TracingState (#6598)
* Switch JIT passes to take a graph rather than TracingState

* Add pybind11 binding for ONNX pass from graph

* Fix canonicalize pass

* address comment

* Switch ToONNX to explicitly return new graph

* optimize_graph instead of optimize_trace
2018-04-13 17:38:22 -07:00
825ce7f196 [jit][script] Allow tuples to be re-assigned (#6538)
* Allow tuples to be re-assigned

This commit improves our support of tuples by making them more first-class.
In particular, it allows tuples to be re-assigned across loops and ifs.
It does this by making them first-class values in the Graph IR, and then
removing the tuples in a LowerTuples pass.

An alternative approach would have added more support for desugaring tuples
in the Environment object as they were emitted. Instead,
the current approach was chosen anticipating a future when tuples are
fully supported (including the interpreter). In that future, the current
code can be completly reused with the LowerTuples pass just becoming
a optimization that removes unneeded tuple allocations.
2018-04-13 17:34:50 -07:00
0042851e04 Fixing some typos (#6595) 2018-04-13 16:30:31 -07:00
11b9180563 [auto] Update onnx to 5355440 - add fuse_conv_add_into_bias optimizer (#707)
5355440f5a
2018-04-13 23:03:37 +00:00
9dfc01b659 [auto] Update onnx to b7d66d8 - Add some more type/shape inference implementations (#725)
b7d66d8838
2018-04-13 20:41:22 +00:00
84707be156 WorkersPool uses atomic writes to task_ (#6577) 2018-04-13 13:26:41 -07:00
e10d5cdc68 Change to ldd parsing regex (#6592) 2018-04-13 13:10:31 -04:00
3140fe0ed1 [auto] Update onnx to 7b33c37 - fix docs of pool op (#751)
7b33c37ae5
2018-04-13 16:52:17 +00:00
6c0f74089f More precise digamma (#6517)
* More precise digamma

Fixes #6190.

This is a rebase of #3955 with some tweaks for better performance around
poles. The code is ported over from cephes with permission.

By itself, the cephes code returns inf for the poles.

For better performance around the poles with float32, one intermediate
step is always computed with double precision, regardless of dtype.
This step does `PI / tan(PI * input)`. This is necessary because small (1e-6)
rounding errors for the inputs to tan have strong effects on the output
(ie, the derivative of tan is very large at some points).

* Replace usages of finite-differences digamma with newly implemented digamma

* Better behavior near and at poles

* ScalarConvert -> scalar_cast for readability
2018-04-13 11:49:09 -04:00
99cfb56698 Add docs for torch.randn_like (#6565)
* Add docs for torch.randn_like

* Address comments

* Address commetns

* Address comments
2018-04-13 11:33:56 -04:00
62ac7f9812 [auto] Update onnx to fa04841 - specify default value for thresholdedrelu's alpha attribute . (#753)
fa048410fa
2018-04-13 03:12:38 +00:00
f3a9be0ed5 Fix RNN parameters description (#6575) 2018-04-12 23:08:44 -04:00
56563a0a79 Use THC allocation for CUFFT workspace (#6568)
* use THC allocation for CUFFT

* use auto& instead
2018-04-12 21:11:44 -04:00
be86500244 Conda binary changes (#6534)
* Adding integrated pytorch-caffe2 package

* Updates

* Fixing more substitution

* Fix to pytorch build location

* Bugfixes, progress towards including CUDA libs in package

* Fix to sed call

* Putting off packaing CUDA libs for Caffe2

* Progress towards packaging CUDA libs

* Progress towards packaging CUDA libs

* Changes to CUDA copying

* Turning on CUDA lib packaging

* Correction to env variables passed into meta.yaml

* typo

* Adding more needed variables in build.sh

* Adding some debugging info

* Changing versioning to have dates and be in build string

* Removing version from build string

* Removing packaging CUDA logic for static linking (later)

* Changing version to mirror pytorch

* Removing env variable req in build.sh

* Change to sed to port to mac
2018-04-12 16:51:06 -07:00
3b0204d43c [JIT] Hacky: Staged symbolics for RNN nodes (#6297)
* Staged symbolic for RNN modules

* Move function to symbolic.py

* Add comments, improve tests, fixup logic
2018-04-12 16:29:25 -07:00
8af0f69a23 lowercase tools/cpp_build/libtorch/CMakeLists.txt (#6567) 2018-04-12 16:21:46 -07:00
d725cd5966 Fix ATen build in Caffe2 (#6496) 2018-04-12 16:09:50 -07:00
30a37a2111 [auto] Update onnx to 5c9c778 - [Typing 2/3] Add python type hints for C++ code (#610)
5c9c778270
2018-04-12 22:47:41 +00:00
16704249cb Add docs for tensor.index_put_ (#6563) 2018-04-12 17:00:02 -04:00
c2187790e3 Improve utils.checkpoint docs (#6526)
* improve util.checkpoint docs

* change volatile to no_grad, and add more explanation

* address comments
2018-04-12 16:59:06 -04:00
e01569afd7 Restore allow_unused functionality (#6553) 2018-04-12 21:30:42 +02:00
60b67eb604 Create CODEOWNERS entry for torch/onnx (#6560) 2018-04-12 15:27:29 -04:00
6ce6c0ed65 [caffe2] Fix bug in NNPACK bindings for convolution in precomputed transform (#6555)
Caffe2-NNPACK integration created blobs for precomputed kernel transorms based on the name of Conv operator.
When Conv operators have the same name (e.g. empty string), or the blobs for precomputed transforms get the same name and overwrite each other.
This patch ensures that blobs for all precomputed transforms in the network get a unique name.
2018-04-12 15:20:26 -04:00
8aa0ae3836 Support arbitrary number of batch dimensions in *FFT (#6528) 2018-04-12 15:03:22 -04:00
749d51414a Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00
8995ddda05 [jit][script] Check that each builtin returns the right number of values. (#6492)
* Fixes to the way script handles multiple values, and other minor fixes.

This commit improves our handling of operators that return multiple values.
Builtins are now checked so that they return the right number of values,
and support for TupleValue is extended to all things that can return
multiple values.

This resolves issues where the compiler accepted things like:

  a, b = c + c

This would cause the interpreter to crash. Now each operator knows
how many results it will produce and can check it against the number
of requested inputs.

Notes:
* Allow True/False literals in constant expressions
* make handling of keyword constants more consistent to support True/False
* make parsing constants match the way we construct constants from python
* improve the error messages when accessing bad graph attributes.
* switch findTensorOp to return an optional.
* check that attribute types are correct in findTensorOp
* Check the correct number of outputs for builtins

This also changes emitExpr to return a single SugaredValue

Rather than possibly returning multiple values, emitExpr now
always returns a single value, which _might_ be a tuple. This approach
more closely follows python making the code easier to follow.

Checks for returning the right number of values are now located in
the assignment operator, and occur when unpacking the tuple.

We still pass `n_binders` to function calls so that calls into python
know how many values they should return.
2018-04-12 10:32:49 -07:00
f6e8b86315 STFT is differentiable out of the box. Fix the regression that marked it as backward-not-implemented (#6541) 2018-04-12 08:59:00 -04:00
d45f3d0d5c Skip cpp_extensions test when possible on Windows (#6423) 2018-04-12 12:12:39 +02:00
8849bea120 [caffe2] Update ReduceOps (#6497)
* Update ReduceMean

* Add reduce mean to math

* Update cuda flag

* Update Eigen::Tensor ctor

* Remove unused variables

* Skip ReduceTensorGPUTest if no gpus

* Add NOMINMAX for windows

* Fix lpnorm_op in windows
2018-04-11 23:36:05 -07:00
0a6331792d Fix #6398, Add MKL threading support for Windows (#6416)
* Add openmp support for Windows

* Remove pthread from dependency list

* Revert "Add openmp support for Windows"

This reverts commit f234c124ba2b47746e197bc185c083737fee6e65.

* Don't link with msvc openmp libs
2018-04-11 23:10:06 -04:00
f54eac7eba Add flag and warning for Python 2.7 users on Windows (#6499) 2018-04-11 23:06:51 -04:00
fc56e8fea5 Quote arguments only when possible (#6405)
* Quote arguments only when possible

* Minor fix

* Add no quote conditions
2018-04-11 23:03:02 -04:00
1943e9763f [ONNX][easy] Don't set uniqueName if it's already set (#6533) 2018-04-11 18:41:38 -07:00
63b5cc47eb [caffe2] Minor changes in NNPACK CMake scripts (#6532)
- Tell NNPACK to not link pthreadpool, but only its headers
- Remove FindNNPACK.cmake as it is no longer used
2018-04-11 20:56:38 -04:00
434f710f3f [Caffe2] Add support to TensorRT (#6150)
* Add support to TensorRT

* Removed License header

* Bind input/output by position

* Comments

* More comments

* Add benchmark

* Add warning for performance degradation on large batch

* Address comments

* comments
2018-04-11 17:03:54 -07:00
1f0b07cddc fix typos in sampler.py (#6525) 2018-04-11 17:27:25 -04:00
6b7ec95abb Link relevant FAQ section in DataLoader docs (#6476)
* Link FAQ section on workers returning same random numbers in DataLoader docs

* explicitly mention section names
2018-04-11 13:41:46 -04:00
5ce6b97aee Use symbolizer in ASAN (#6506) 2018-04-11 13:41:23 -04:00
494aaab00e Add docs for item() (#6508) 2018-04-11 12:40:01 -04:00
1e5611014d Adding autofunction entry for torch.randint (#6507)
* added randint function in ATEN yaml as well as Tensorfactories.cpp

* corrected randint

* randint with overloading complete,getting tuple of ints behaviour though

* done randintlike and randint_out

Left : adding docs and test, and remove the bug on size = (5)

* Removed my error messages, ThRandomTensor will handle all exceptions

* added docs and tests, corrected a mistake

Tested with manual seeds in some test cases as well. Seems fine to me (check documentation though)

* corrected indentation to spaces, and improved sizes argument description

* made documentation argument description shorter

* added whitespace after ',' in torch docs

* addes spaces in documentation

* added more tests (including bounds and overloading features)

* added whitespaces in test_torch

* removed trailing whitespaces

* removed whitespace from a blank line

* removed positive requirement from docs. Added dtype argument and gave eg

* made randint over randn in all files

* changed to data type for dtype in docs for randint

* added autofunction entry for randint in torch.rst
2018-04-11 12:34:25 -04:00
ca09e4a3c5 Fix THTensor_(take) negative index check (#6482)
* fix THTensor_(take) negative index check

* add tests

* rename to invalidIdxPos
2018-04-11 12:12:35 -04:00
e07952dbc9 Add SmallVector from llvm (#6485)
Adds at::SmallVector and supporting AlignOf class to ATen from LLVM.

http://llvm.org/doxygen/SmallVector_8h_source.html
2018-04-11 12:01:12 -04:00
d3f11310fa [auto] Update onnx to 00fa587 - Enhancements to shape inference (#655)
00fa58791e
2018-04-11 15:09:16 +00:00
d9345aa60f add checkpoint to index.rst (#6498) 2018-04-11 02:50:01 -04:00
e4f1d3b538 Better warnings (#6428)
* Better warnings

* Remove -Wc++14-extensions because gcc does not know it

* Warning fix in input_buffer.cpp

* Remove pedantic for torch/csrc/

* Also use Wextra and Wall for ATen

* Use check_env_flag

* Undo changes in shape_analysis.cpp

* Remove C linkage flag
2018-04-10 23:34:25 -07:00
ef8f556212 [Caffe2] Changes done inside Facebook (#6378)
* fix unit test for sqrt op

From the error logging:

[idx, grad, grad_estimate] are:
[[ 146.            0.5           0.45776367]
 [ 147.            0.5           0.45776367]

The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; )

The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss)

This diff
- increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :)
- also clean up, and merge the test case for inplace Vs. non-inplace

Tested with:

`CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"`

* CompositeReader & CompositeReaderBuilder

A new type of reader gluing multiple readers together.

* Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid"

Original commit changeset: 9325a4356dbe

* [dai][WIP] convert params to int8 on ps before sending to trainer

Add float->uint8 conversion in addition to float->fp16 conversion in model_saver.

* [easy] improve unit test for sparse length sum ops

as desc.

#accept2ship

* Update GitHub upstream to 771fcb3455cbfe69c2abcc4cb3bd7ef92d59af24

* move sparse hash unique ops to OOS and add unit tests

- move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1
- The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2

- fix the CUDA UniqueOp for the case when batch is empty.
- add unit test

* group_norm_op for caffe2

This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494

This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel).

* Resubmit D7405233: disappeared in D7464958

OOS publish causes the op missing -- however, test was still there

* [c2] add sparse hash engine for cuda unique op

The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU.

* [dper][gpu] enable unit testing gpu trainer for sparse nn

to debug the GPU trainer using mock data in unit test.

make it easier to develop GPU trainer for new models.

* Reuse Gloo context for Synchronize() calls

Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts).

* [GanH/WGAN][1/n]: add FC param clipping

as titled

* [mobile] minimizing changes between caffe2_benchmark and speed_benchmark

* [GanH]: enable diagnose within model

avoid finding blob names but to directly enable inside the model

* Add `net_transformer_fun` option to DPM

This callback allows for various transformations to be made to the
model after gradient operators have been added. The immediate motivation for
this is to allow transformations such has "checkpoint-and-recompute" which
allow trading off memory for additional compute.

Adding several callbacks like this has made DPM's API less than ideal at this
stage. However, I could not find any reasonable alternative.

* [DT] [33/n] Compile flow task groups

task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary.

* Initial commit for sparse_normalize vectorization and benchmark

* [GanH]: LB Calibration for JSD

as titled

* Tracing event in async executor

Adding event tracing through TRACE_EVENT macro in async executor

* [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset

D7409751 got lost in D7464958

* Visualizing realtime weights values

we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index.
Currently, we assume the blob to be 2 dimensional.

* [GanH][Easy]: Fix Homotopy Weighting

apparantely, there was a bug in homotopy weight (alpha, beta) update

* [c2] move sparse hash unique op out of oss

so that oss do not need to depend on google hash map.

* Get rid of std::round as it's not supported on Android

* Revert changes on setup.py

* Skip shaky test on Dataio

* fix
2018-04-10 21:11:43 -07:00
0dff2b5e35 [fft] [3 of 3] Implements backward of fft ifft rfft irfft (#5537)
* change irfft signal_sizes arg to be the last

* add docs for fft, ifft, rfft, irfft; update doc for stft

* fix typo in window function docs

* improve gradcheck error message

* implement backward of fft, ifft, rfft, irfft

* add grad tests for fft, ifft, rfft, irfft

* fix nits and typos from #6118

* address comments
2018-04-10 22:09:36 -04:00
63472bcf29 Sync current changes in ACL backend (#6484)
* Sync changes in ACL backend
2018-04-10 17:32:22 -07:00
37d5c58f4b Skip all TestTorch tests in test_cuda.py (#6489) 2018-04-10 20:31:05 -04:00
7bd398b3db Add fuseNNPACKConvRelu (#6439) 2018-04-10 16:51:16 -07:00
5f311da758 Make python setup.py clean delete aten/build. (#6487)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-10 18:53:40 -04:00
d4e13a4ec8 [auto] Update onnx to 50fe321 - Fix fix (#744)
50fe321d05
2018-04-10 22:06:04 +00:00
5c8290c20d Update MKL version to 2018.2.185 (#6483) 2018-04-10 18:05:10 -04:00
6f10978e7b Skip C++ extensions test when ninja is not available (#6480) 2018-04-10 14:50:24 -07:00
432425c76b [auto] Update onnx to 1963285 - Guard type checking numpy imports (#741)
1963285656
2018-04-10 20:44:44 +00:00
ae592b4999 Louder warning for C++ extensions (#6435) 2018-04-10 12:47:39 -07:00
8e1d920695 Fixed Clang Compilation Warnings for THD by removing outdated C linking (#6448) 2018-04-10 12:41:40 -07:00
e3196e0ea8 [Re-checkpointing] Autograd container for trading compute for memory (#6467)
* Autograd container for trading compute for memory

* add a unit test for checkpoint

* address comments

* address review comments

* adding some docs for the checkpoint api

* more comments

* more comments

* repro bug

* Fix a subtle bug/apply some review comments

* Update checkpoint.py

* Run everything in grad mode

* fix flake and chunk=1

* use imperative backward as per discussion

* remove Variable and also add models and test for models

* Add a simple thread local variable to check for autograd grad mode

* remove models and models test after debugging

* address review comments

* address more comments

* address more comments
2018-04-10 15:26:24 -04:00
04c215b445 Add link in docs menu to stable docs (#6475)
Part of #5738. Warns users that they're not viewing the latest stable
release docs.

We should remember to delete this when cutting out 0.4.0 release docs. (we'd just delete the div in pytorch.github.io)
2018-04-10 14:53:04 -04:00
c3f7e5ff55 Install signal handler for SIGCHLD in run_test.py (#6436)
Handle exit signal in run_test.py
2018-04-10 11:31:23 -07:00
ad5d421554 [JIT] Implement staged symbolics for pack_padded_sequence/pad_packed_sequence (#6256)
* Unit test for pack_padded tracing

* Move monkeypatching stuff

* Switch symbolic

* Fix stack traces and update test

* Fixup and confirm e2e working

* lint

* Move monkeypatch back to onnx

* Address comments

* remove extraneous import

* Add gradient checking

* lint

* Address comments

* improve test case
2018-04-10 11:30:50 -07:00
8d6c5c7898 [auto] Update onnx to 0ee95e3 - Split operator tests (#557)
0ee95e36e6
2018-04-10 18:04:04 +00:00
64e94814da Clean-up test_indexing.py after Tensor/Variable merge (#6433) 2018-04-10 14:03:14 -04:00
aea31131e5 [auto] Update onnx to 7fcdf41 - Setup mypy type checker (#676)
7fcdf41557
2018-04-10 17:51:02 +00:00
038b66ee07 [caffe2] use dictionary in Printer (#6443) 2018-04-10 10:37:07 -07:00
930f181255 Fix fft when any of the input dimensions is not aligned (#6118)
* fix fft when any of the input dimensions is not like complex type; add test for ifft+fft

* clarify the comments

* Address comments: add note; add helper function

* use at::nullopt

* add notes on conjugate symmetry; fix complex-to-real cloning condition (should be advanced data layout rather than base_istride)

* add at::sum_intlist and at::prod_intlist

* revert optional<vector> helper due to windows compiler error
2018-04-10 13:11:05 -04:00
bb097e2a50 [pytorch] Fix signed random_ (#6463)
* Fix cpu signed random

* fix gpu signed tensor

* add test for signed random_

* cleaner tests

* fix lint
2018-04-10 13:07:04 -04:00
f41044fa8a [caffe2][nomnigraph] Generic subgraph replacement (#6368)
* Nomnigraph cutting

* Comments

* Use caffe2::to_string

* More to_string
2018-04-10 09:26:31 -07:00
acb7df11a2 Add torch.randint and torch.randint_like functions (#6136)
Adds randint and randint_like to TensorFactories.cpp
2018-04-10 12:08:21 -04:00
aa99aa1cb8 Slice (instead of copy) when indexing by a zero-dim tensor (#6426)
Slice (instead of copy) when indexing by a zero-dim tensor

Fixes #6217
2018-04-10 11:47:22 -04:00
59bda9a8c4 Fix reflection padding boundary checks (#6438)
* Fix Reflection padding boundary checks

* Improve padding docs

* fix lint
2018-04-10 10:37:01 -04:00
65a8ac0b8e Add method to calculate perplexity of distribution (#6427) 2018-04-10 12:18:26 +02:00
79c3ebc040 adds correct precision to test_noncontig_conv_grad (#6440) 2018-04-10 12:18:01 +02:00
1110dd1f8f Add mock to conda (#6460) 2018-04-09 23:29:22 -07:00
66791f54d5 Update the compile function of Job (#6323) 2018-04-09 22:44:23 -07:00
df2e1d2962 Disallow using the OOP api workspace as context managers (#6456) 2018-04-09 22:13:54 -07:00
5e12ba92dc Guard couple shape inference functions for unkown input shapes (#6379) 2018-04-09 22:03:56 -07:00
ce37cf7914 [auto] Update onnx to 985af3f - Update PythonAPIOverview.md (#738)
985af3f5a0
2018-04-10 03:06:09 +00:00
c05acd3840 Clarify Embedding padding_idx arg (#6430)
* Clarify Embedding padding_idx arg

* add a sentence about gradient being zero
2018-04-09 23:06:00 -04:00
1533155c4e [JIT][script] Implement compile-time tuples & starred unpacking (#6214)
* Something that works

* Tuple sugared value

* Works with commenting out input size check

* support string frontend

* Initial starred assignment

* Fix parser

* Fixup tests

* clang-format

* fix rebase error

* lint

* move star assign test to string frontend to make py2 happy

* Py2 fix: parse starargs from Call node

* Address some comments

* Fixup merge

* Remove overloaded unary operators

* Bugfix and test case

* Address a few more comments

* asValues -> asTuple

* Remove unrolledFor stuff

* Fixup getValues

* Pass CallsiteDescriptor struct and have different behavior for different call types

* Address comments and lint

* some type checks

* Address comments

* lint

* Fix mistake
2018-04-09 19:34:51 -07:00
afaa72716b [auto] Update onnx to b69be33 - Add backend test for upsample (#729)
b69be334e5
2018-04-10 02:03:16 +00:00
4900118a68 [auto] Update onnx to 0d9496e - Input test data of concat op should be float (#711)
0d9496e79b
2018-04-10 01:49:54 +00:00
265e1a97ec Add different logo for master docs (#6446) 2018-04-09 18:48:53 -04:00
26eb08abfa [auto] Update onnx to 20bcb8b - Fix the spec for batchnorm and instancenorm (#733)
20bcb8bab8
2018-04-09 21:58:48 +00:00
e83dd716ec [caffe2] Support fused Conv+Relu with NNPACK (#6375)
Enable the use of fused Convolution+ReLU functionality from NNPACK
2018-04-09 15:39:31 -04:00
f9d3c3f4fd fix typo in link to sigmoid activation image (#6429) 2018-04-09 14:48:26 -04:00
1b3a5a4e7d bottleneck supports better user-provided arguments (#6425)
Fixes #6312.

Changed bottleneck's arg parser to user argparse.REMAINDER. This lets
the user specify args as `python -m torch.utils.bottleneck script.py
[args]` (previously, a -- was needed after `bottleneck` and before
`script.py`).
2018-04-09 13:57:26 -04:00
5651695a99 Fixes #6386, Use copies instead of symbolic files (#6396)
* Use copies instead of symbolic files

* bug fix

* Remove useless item
2018-04-09 13:54:10 -04:00
d0f395f744 [pytorch] Fix clamp is missing kwarg out (#6028) (#6418)
torch.clamp is out from template code, add it manually, same with auto
generated code.
2018-04-09 13:39:31 -04:00
57ee202022 Use string comparison in OS check (#6420) 2018-04-09 09:23:22 -07:00
a91c88a348 Check mappings ONNX -> Caffe2 bear the same argument names (#6317)
* Check mappings ONNX -> Caffe2 bear the same argument names

When adding an extra arg to an input ONNX op, if it's not supported in Caffe2, the exporter would just silently pass it to NetDef and ignore it in the implementation. It's pretty error-prone. Caffe2 also has an OpSchema description and we can enforce that all arguments explicitly appear in schema or listed explicitly in Caffe2.

See also https://github.com/caffe2/caffe2/pull/2478

Add test for C2 argument checking

* Some operators do not log arguments, which prevents argument checks.
Invite users to file an issue to fix the schema.
2018-04-09 09:15:42 -07:00
73a23b492c Add mock python module for testing (#6387) 2018-04-09 09:12:10 -07:00
0cabab02bb Another CUDA 8 fix for Windows (#6383)
* Another CUDA 8 fix for Windows

* Skip ATen tests when compiler is not sufficient

* Fix wrong syntax
2018-04-09 10:20:37 -04:00
18fc4fd447 Using a function registry for THD init_methods for easy extension (#6334) 2018-04-09 12:54:12 +02:00
108f5c197f [pytorch] add static linkage support for CuDNN and NCCL (#6410)
* when linking static CUDA libs, additional dep on culibos.a

* add USE_STATIC_NCCL option

* add USE_STATIC_CUDNN option

* remove libATen soversion

* add caffe, caffe2 folders to setup.py exclude list
2018-04-08 22:54:18 -04:00
4d15442ebc Add total_length option to pad_packed_sequence (#6327)
* add total_length to pad_packed_sequence; add example on how to use pack->rnn->unpack with DP

* address comments

* fix typo
2018-04-08 20:25:48 -04:00
88da5a0db4 fix incorrect error message in convolution_expand_param_if_needed (#6409) 2018-04-08 20:25:03 -04:00
99939b6d90 Increase margin for CPU perf test, and change test order (#6363) 2018-04-08 17:00:43 -04:00
Ben
119ea39021 add cuda headers (#6401) 2018-04-08 10:50:20 -04:00
67bbf585cd Fix the c2-onnx exporter bug on Gemm (#6331) 2018-04-07 16:48:29 -07:00
3b58b859b2 Fix typos in docs (#6389) 2018-04-07 12:41:15 -04:00
e9adbbba82 refactor reduce arg to _Loss superclass (#6371) 2018-04-07 11:09:31 -04:00
e0f3e5dc77 fix activation images not showing up on official website (#6367) 2018-04-07 11:06:24 -04:00
aecec8b412 [auto] Update onnx to c9f825f - Refine a little bit about op spec. (#666)
c9f825fc68
2018-04-07 14:59:32 +00:00
c053a76182 Several minor fixes for Windows build (#6332)
* Several minor fixes for Windows build

* Use version_info instead of version
2018-04-07 11:39:59 +02:00
32f3bf7946 Simplify and extend cpp build (#6343)
* Modify cpp build

* Use absolute path in .jenkins/pytorch/build.sh
2018-04-06 22:26:16 -07:00
a915e4715c [auto] Update onnx to a484eb2 - Fix an error in Conv doc (#731)
a484eb2cb3
2018-04-07 03:15:20 +00:00
997acfd7fe [Caffe2] Some small changes to InferBlobShapesAndTypes definition and SameAsInput Schema (#6335)
* Change Same as input type deduction to work for ops with multiple outputs

* change InferBlobShapesAndTypes definition to take vector ot pointers instead of unique_ptr. The function doesn't own the objects, so no need to pass smart pointers and that prevents calling the function with existing object, since the caller has to create unique_ptr, i.e. copy an existing object just to create the pointer

* switching order of std::move<unique_ptr> and uniqur_ptr.get

* adding comma
2018-04-06 19:06:46 -07:00
774601c04c [Caffe2] Consolidating conda build scripts (#6359)
* Consolidating conda build scripts

* Grep bug

* Naming bug

* Correcting quoting of variable passing
2018-04-06 16:54:42 -07:00
f2130ae495 [auto] Update onnx to 7410cc4 - Fix incorrect package output paths (#730)
7410cc4abf
2018-04-06 23:18:10 +00:00
47259cfb6a [nomnigraph] Version bump (#6364)
updating nomnigraph to clean up diff stack
2018-04-06 15:52:12 -07:00
a9a96a4acb Fix the onnx split backend axis handling (#6366) 2018-04-06 15:47:27 -07:00
e45b51148a [caffe2] Always build NNPACK together with Caffe2 (#6365)
Caffe2 started with an option to use NNPACK pre-installed in the system.
Now this option is mostly legacy, as Caffe2 can include NNPACK in its own build on all platforms.
Due to problems when pre-installed NNPACK is built with different dependencies or compiler options, we decided to remove this option and alwyas build NNPACK with Caffe2.
This change makes Caffe2 always build NNPACK as part of its own build, and updates NNPACK and cpuinfo submodules.
2018-04-06 18:27:59 -04:00
aab0bd3c13 Change onnx_optimizer API (#6290) 2018-04-06 13:46:53 -07:00
6d8a33b5e6 [auto] Update onnx to be546e2 - Improve optimizer's API and docs (#713)
be546e257c
2018-04-06 20:46:09 +00:00
87e369111a Add string-style devices to all tensors. (#6283)
* Add string-style devices to all tensors.

Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor.   This made it necessary to if/else code that
was meant to be device agnostic.

This PR implements the following:
1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors.
For cpu tensors this is 'cpu'.  For cuda tensors this is 'cuda:X', where X is the cuda device ordinal.

2) Adds a DeviceSpec class.  This is just a helper class for separating device_type and device_index specification and to allow partial specification.
For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1).
Also has backwards compatibility support for specifying integers, which are treated as cuda devices.

DeviceSpecs have the following properties:
a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda')
b) device_index: integer for the device index (None if not specified)
c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously.  I.e. if a function previously took integers for cuda devices,
it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`.

3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs.  For example:
torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1')

TODO in future PRs:
A) Split out cuda from dtype so you don't need to overspecify cuda-ness
B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions.  We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc.
at the torch. level that work on strings/DeviceSpecs

* Add deviceInt64 to python arg parser.

* device_str.

* Remove device_str.

* remove device prefix from attributes.

* Use const char * instead of string.

* Move autogpu index out of Device.

* comment on is_default.

* Rename torch.DeviceSpec to torch.device.

* comment.

* Fix tests.

* Fix flake8.

* Fix sparse_coo_tensor parameter name.

* Improve error message.

* Remove device_ prefix from C++ device object.

* Allocate static strings.

* Return not implemented from rich compare.

* Move torch::Device to THPDevice.

* Remove cuda index.

* Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
2018-04-06 15:12:05 -04:00
fc7aa5c3be Fix torch.dtype getting incorrectly rendered as torch.dpython:type by sphinx (#6358) 2018-04-06 14:59:22 -04:00
b724084335 INCULDE typofix. (#6354)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-06 13:57:07 -04:00
c42f4fa2ee Add missing attributes to the schema GivenTensorFill operators (#6330) 2018-04-06 08:42:22 -07:00
c00ee6da8f Fix typos (#6348)
* Fix typo

* Fix typo

* Update faq.rst
2018-04-06 11:06:42 -04:00
81676d8554 [auto] Update onnx to c61506f - Fix the shape inference python API (#716)
c61506f9f4
2018-04-06 05:17:56 +00:00
876ad110af Skip some unsupported onnx backend tests (#6247) 2018-04-05 21:33:35 -07:00
5198d2b9ab [auto] Update onnx to e9d4134 - Fix cmake on windows when not building python extension (#728)
e9d41346d2
2018-04-06 02:30:58 +00:00
4cde7c0f09 Modify cmake dedent function to make it compatible with Windows. (#6296) 2018-04-05 21:37:12 -04:00
a093ec997f fix typo (#6329) 2018-04-05 21:36:16 -04:00
c1cd6eab9f Handle broadcasting in the JIT (#6084)
* Add size checks to JIT's fuser

* Handle broadcasting in shape propagation pass

* Fix build errors and add tests
2018-04-05 17:07:52 -07:00
2f30fe64fd [auto] Update onnx to 72187aa - Add value_info support in make_graph (#726)
72187aa08d
2018-04-05 22:17:19 +00:00
aba5f129bc fix broadcast export to onnx (#6243) 2018-04-05 14:25:37 -07:00
29c69f049e add test for old tensor serialization (#6275) 2018-04-05 17:00:30 -04:00
15f636bd10 [auto] Update onnx to 67b7d89 - Fix gen_proto in cmake (#719)
67b7d89d24
2018-04-05 20:38:11 +00:00
38b995a13b Fixing conda test builds (#6261)
* Moving conda test package installs into docker image

* Small nits

* Onnx setup.py still needs PROTOBUF_INCDIR passed in
2018-04-05 13:27:43 -07:00
0b3edfd3dd [caffe2] Do not print version and build info unless explicitly requested (#6282) 2018-04-05 16:09:13 -04:00
482e1511ff Revert "Increase # of runs for CPU perf test, and increase margin of error" (#6322)
* Revert "Add __constants__ to Script modules (#6092)"

This reverts commit 5ab30eedf33c670514685838423371f9a5df80f3.

* Revert "[ready] Implement log2 and log10 in PyTorch (#6272)"

This reverts commit 0aa35780bfade6bf9c428f1ae45426caa8a7df93.

* Revert "Use reshape({-1}) (#6281)"

This reverts commit 8ae67a444506a838e648aa60f9eb6a4da22c9b06.

* Revert "Move instruction set specific code to anonymous namespace (#6314)"

This reverts commit 6953c1b77efe2d0764ca9ba7dbf7c9284d68a80c.

* Revert "[auto] Update onnx to 54be8fa - Use cmake3 if it's available (#718) 54be8fad1e"

This reverts commit d33ec12d1e3f4739e10cacf1436764bc54ff89a3.

* Revert "default build with MKL for desktop (#6266)"

This reverts commit 5dcf7078c689f7055ca6837e67ca834cc70d6497.

* Revert "Increase # of runs for CPU perf test, and increase margin of error (#6302)"

This reverts commit 9d1a660670d55590cdab5509bb81c26e8bb3d26a.
2018-04-05 16:06:29 -04:00
d38adfe35d [auto] Update onnx to fcb4ae3 - docs rewording: Important Python Functions -> Python API Overview (#721)
fcb4ae329f
2018-04-05 19:40:52 +00:00
c21ce7e083 [auto] Update onnx to 24275d6 - Ignore .eggs directory when doing lint (#722)
24275d6cea
2018-04-05 19:34:39 +00:00
5ab30eedf3 Add __constants__ to Script modules (#6092)
Like `__slots__` the `__constants__` property changes the set/getattr behavior of a script module for the keys listed so they behave as constants.
This enables script methods to use them in way that are otherwise not allowed.

* Python numbers/bools can be inlined as constants in script code.
* List of numbers can be iterated over using for loops
* nn.ModuleLists can be used in for loops as well, unrolling their content.
2018-04-05 11:31:43 -07:00
0aa35780bf [ready] Implement log2 and log10 in PyTorch (#6272)
* Implemented log2 and log10

* Re-add incorrectly removed files

* Fix minor bugs

* Fix log1p docs

* Add a try-except for python2 math module in log2 test

* Revert changes made to aten/doc/*

* Fix docstring errors

* Fix windows build
2018-04-05 14:28:37 -04:00
8ae67a4445 Use reshape({-1}) (#6281) 2018-04-05 14:27:40 -04:00
6953c1b77e Move instruction set specific code to anonymous namespace (#6314)
The vec256 and SIMD kernels are compiled multiple times with different
headers. It's important that these functions have internal linkage so
that kernels for different architectures don't get combined during
linking. It's sufficient to label functions "static", but class methods
must be an unnamed namespace to have internal linkage (since static
means something different in the context of classes).

This fixes a bug in which the implementations of Reduction::reduce_all
for different instruction sets was getting combined during linking.
2018-04-05 14:21:33 -04:00
d33ec12d1e [auto] Update onnx to 54be8fa - Use cmake3 if it's available (#718)
54be8fad1e
2018-04-05 17:57:26 +00:00
5dcf7078c6 default build with MKL for desktop (#6266)
* default build with MKL for desktop

default build with MKL for desktop

* remove SET(INTEL_COMPILER_DIR "/opt/intel")
2018-04-05 09:36:03 -04:00
9d1a660670 Increase # of runs for CPU perf test, and increase margin of error (#6302) 2018-04-05 09:29:48 -04:00
9b111f1a88 Fix worldsize use in test_distributed with MPI backend (#6301)
WORLD_SIZE is not used for MPI tests and the check fails for
the group tests
2018-04-05 09:28:53 -04:00
de54f23de6 Add default args to loss functions in native_functions.yaml (#6289) 2018-04-05 12:33:43 +02:00
f73c044576 Remove eigen impl for arg_max and arg_min (#6293) 2018-04-04 21:51:27 -07:00
7e0227d3e1 [auto] Update onnx to b8c4238 - Add python function docs (#714)
b8c423889b
2018-04-05 03:51:11 +00:00
73ab15d388 Change ATen to use Caffe2/cmake upstream FindCUDA (#6240)
* Remove ATen's copy of FindCUDA

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Minor bugfix for updated FindCUDA.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Use cl.exe as the host compiler even when clcache.exe is set.

Upstream merge request at https://gitlab.kitware.com/cmake/cmake/merge_requests/1933

H/t peterjc123 who contributed the original version of this patch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Include CMakeInitializeConfigs polyfill from ATen.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Tweak the regex so it actually works on Windows.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-04 23:26:57 -04:00
efc91d8c6d Add arg checks in torch.utils.data.Sampler classes (#6249)
Fixes #6168

* add arg checks in torch.utils.data.Sampler

* add check for positive-ness
2018-04-04 23:07:31 -04:00
0016dad841 [pytorch] minor fixes around binary builds (#6291)
* remove patch

* check that cuda dev environment is also present before running cpp_extension cuda tests

* add OSError to list of exceptions when c++filt is not found
2018-04-04 22:37:13 -04:00
12bfa47ddd Onnx RNN export: remove Constant default hidden state (#6199)
when no explicit hidden state is provided, a default is created by
constructing a new Variable filled with zeros. This gets traced as a
Constant operator, which hardcodes in the batch size.

To fix this, we remove such constant operators in an 'optimization'
pass. We could have also fixed it by causing the code to not generate
a Constant in the first place, but this is the least invasive fix from
the perspective of the pure pytorch codebase.
2018-04-04 19:22:38 -07:00
afdaf52c34 Change Python Arg Parser to only read default params if they are assigned (#6254)
* only read default param if it's actually assigned

* address comments
2018-04-04 15:43:32 -07:00
8df2487de9 Properly skip the failing onnx conversion test (#6280) 2018-04-04 14:07:03 -07:00
ed9952dd25 Update FindCUDA to cmake master as of 561238bb6f07a5ab31293928bd98f6f… (#6241)
* Update FindCUDA to cmake master as of 561238bb6f07a5ab31293928bd98f6f8911d8bc1

NB: I DID have to apply one local patch; it's the `include_guard` change. Should
be obvious next time you do an update.

Relevant commits:

    commit 23119366e9d4e56e13c1fdec9dbff5e8f8c55ee5
    Author: Edward Z. Yang <ezyang@fb.com>
    Date:   Wed Mar 28 11:33:56 2018 -0400

        FindCUDA: Make nvcc configurable via CUDA_NVCC_EXECUTABLE env var

        This is useful if, for example, you want ccache to be used
        for nvcc.  With the current behavior, cmake always picks up
        /usr/local/cuda/bin/nvcc, even if there is a ccache nvcc
        stub in the PATH.  Allowing for CUDA_NVCC_EXECUTABLE lets
        us work around the problem.

        Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    commit e743fc8e9137692232f0220ac901f5a15cbd62cf
    Author: Henry Fredrick Schreiner <henry.fredrick.schreiner@cern.ch>
    Date:   Thu Mar 15 15:30:50 2018 +0100

        FindCUDA/select_compute_arch: Add support for CUDA as a language

        Even though this is an internal module, we can still prepare it to
        be used in another public-facing module outside of `FindCUDA`.

        Issue: #16586

    commit 193082a3c803a6418f0f1b5976dc34a91cf30805
    Author: luz.paz <luzpaz@users.noreply.github.com>
    Date:   Thu Feb 8 06:27:21 2018 -0500

        MAINT: Misc. typos

        Found via `codespell -q 3 -I ../cmake-whitelist.txt`.

    commit 9f74aaeb7d6649241c4a478410e87d092c462960
    Author: Brad King <brad.king@kitware.com>
    Date:   Tue Jan 30 08:18:11 2018 -0500

        FindCUDA: Fix regression in per-config flags

        Changes in commit 48f7e2d300 (Unhardcode the CMAKE_CONFIGURATION_TYPES
        values, 2017-11-27) accidentally left `CUDA_configuration_types`
        undefined, but this is used in a few places to handle per-config flags.
        Restore it.

        Fixes: #17671

    commit d91b2d9158cbe5d65bfcc8f7512503d7f226ad91
    Author: luz.paz <luzpaz@users.noreply.github.com>
    Date:   Wed Jan 10 12:34:14 2018 -0500

        MAINT: Misc. typos

        Found via `codespell`

    commit d08f3f551fa94b13a1d43338eaed68bcecb95cff
    Merge: 1be22978e 1f4d7a071
    Author: Brad King <brad.king@kitware.com>
    Date:   Wed Jan 10 15:34:57 2018 +0000

        Merge topic 'unhardcode-configuration-types'

        1f4d7a07 Help: Add references and backticks in LINK_FLAGS prop_tgt
        48f7e2d3 Unhardcode the CMAKE_CONFIGURATION_TYPES values

        Acked-by: Kitware Robot <kwrobot@kitware.com>
        Merge-request: !1345

    commit 5fbfa18fadf945963687cd95627c1bc62b68948a
    Merge: bc88329e5 ff41a4b81
    Author: Brad King <brad.king@kitware.com>
    Date:   Tue Jan 9 14:26:35 2018 +0000

        Merge topic 'FindCUDA-deduplicate-c+std-host-flags'

        ff41a4b8 FindCUDA: de-duplicates C++11 flag when propagating host flags.

        Acked-by: Kitware Robot <kwrobot@kitware.com>
        Merge-request: !1628

    commit bc88329e5ba7b1a14538f23f4fa223ac8d6d5895
    Merge: 89d127463 fab1b432e
    Author: Brad King <brad.king@kitware.com>
    Date:   Tue Jan 9 14:26:16 2018 +0000

        Merge topic 'msvc2017-findcuda'

        fab1b432 FindCUDA: Update to properly find MSVC 2017 compiler tools

        Acked-by: Kitware Robot <kwrobot@kitware.com>
        Acked-by: Robert Maynard <robert.maynard@kitware.com>
        Merge-request: !1631

    commit 48f7e2d30000dc57c31d3e3ab81077950704a587
    Author: Beren Minor <beren.minor+git@gmail.com>
    Date:   Mon Nov 27 19:22:11 2017 +0100

        Unhardcode the CMAKE_CONFIGURATION_TYPES values

        This removes duplicated code for per-config variable initialization by
        providing a `cmake_initialize_per_config_variable(<PREFIX> <DOCSTRING>)`
        function.

        This function initializes a `<PREFIX>` cache variable from `<PREFIX>_INIT`
        and unless the `CMAKE_NOT_USING_CONFIG_FLAGS` variable is defined, does
        the same with `<PREFIX>_<CONFIG>` from `<PREFIX>_<CONFIG>_INIT` for every
        `<CONFIG>` in `CMAKE_CONFIGURATION_TYPES` for multi-config generators or
        `CMAKE_BUILD_TYPE` for single-config generators.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Polyfill CMakeInitializeConfigs

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Tweak condition for when to use bundled FindCUDA support.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Comment out include_guard.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-04 17:04:21 -04:00
9ba70856a1 Add max_values and argmax convenience functions to ATen (#6201)
* Add max_values and argmax convenience functions to ATen

* Add documentation for torch.argmax/argmin and skip max_values

* Add tests for argmax/argmin

* Dont default the dim argument

* Use dim=0 in test_torch.py for argmax tests

* Implement argmin()  and argmax() without dim

* Call .contiguous() before .view(-1)
2018-04-04 15:53:26 -04:00
92e7f627cd Add typing dependency to caffe2 CI (#6195)
This is needed to run mypy on CI
2018-04-04 15:48:02 -04:00
004545fe32 [Caffe2] Always build local protobuf library with -fPIC (#6264)
* Always build local protobuf library with -fPIC

* .
2018-04-04 11:08:11 -07:00
8f27c27941 fix legacy tensor __setstate__ (#6251) 2018-04-04 13:36:56 -04:00
0469926ba3 Add a CODEOWNERS file (#6274)
* Add a CODEOWNERS file

* This will let us require review from owners of aten/ and torch/ while giving wider access (for now) to caffe2/
* This will be adjusted as we work on shared components.

* update OWNERS to cover more pytorch bits
2018-04-04 13:18:00 -04:00
fd580ce419 Fix potential UB when input is empty (#6242)
If the source and result tensors are empty, arr_in and arr_out may be
null (and size will be 0). This previously called memcpy(null, null, 0),
which is UB according to
http://en.cppreference.com/w/cpp/string/byte/memcpy.

Note that either one of these changes would be sufficient.

(Detected by UBSan)
2018-04-04 11:59:21 -04:00
2f0bb19d7b Do not use cpuinfo on PowerPC (#6255)
cpuinfo_initialize() prints error message to the console/log when run
on unsupported CPU/platform. Even the code will work fine this is
confusing error message that shouldn't be shown to the users when use
PyTorch on other architectures than the supported by cpuinfo.
2018-04-04 11:45:33 -04:00
3497f0207c [distributions] KL-Divergence for Multivariate Normal (#6172) 2018-04-04 13:19:47 +02:00
1499a604cf fix assertion error when input size smaller than number of module_copies (#6252) 2018-04-04 12:05:34 +02:00
b125033f85 Manually bump onnx submodule to current latest (#6237)
* Manually bump onnx submodule to current latest

* skip _equal_ tests

* Revert "skip _equal_ tests"

This reverts commit 72db49ebc16c9f98ed12add293a8f41e7d509bf3.

* bump to include a fix

* bump
2018-04-03 22:59:03 -07:00
5f268b0668 Fix the processing of extra cmake args passed to caffe2's setup.py (#6263) 2018-04-03 22:48:14 -07:00
7fd56b2c1f Remove unnecessary properties from Layout. (#6250)
This was just copy/pasted/changed from Dtype and shouldn't have gotten through.
2018-04-03 22:48:46 -04:00
a2880531ea fix SGD lr check (#6244) 2018-04-03 21:29:18 -04:00
06a697785c Add dtype to torch.*_window; Add dtype.is_floating_point (#6158) 2018-04-03 21:19:30 -04:00
6b3a4637d6 Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785)
This changes type(tensor) to return `torch.Tensor` instead of
`torch.autograd.Variable`.

This requires a few implementation changes:

 - torch.Tensor is now a regular Python class instead of a
   pseudo-factory like torch.FloatTensor/torch.DoubleTensor
 - torch.autograd.Variable is just a shell with a __new__ function.
   Since no instanes are constructed it doesn't have any methods.
 - Adds torch.get_default_dtype() since torch.Tensor.dtype returns
   <attribute 'dtype' of 'torch._C._TensorBase' objects>
2018-04-03 16:29:25 -04:00
dfcd90783c fix sparse embedding backward when input contains only padding_idx (#6211) 2018-04-03 15:53:43 -04:00
14bf37f22e Fix AvgPool breaking changes (#6221)
Made in 605307f8f3c249d9279030502d2aac98d4170b83
2018-04-03 15:51:21 -04:00
de51764119 Fix memory leak in maxpool3d backwards (#6230)
Fixes #6222

We don't need to make sure gradInput is contiguous because it's always
passed in as an empty tensor (see CUDAFloatType.cpp after it gets
codegen-ed). This was increasing the reference on gradInput and leaking
it.

I'm not sure if there's a good way to test this. I put together a script
that
1) Prints out when a tensor is allocated and deallocated
2) Checks allocations vs deallocations after running a python script
And verified that each allocation matches each deallocation.
2018-04-03 15:47:29 -04:00
29e81e01aa Expunge ATen submodule; use the in-tree copy. (#6235)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-03 15:47:07 -04:00
40096c98ff Support export torch.max(input, dim) and torch.min(input, dim) to ONNX (#6220)
* Support export torch.max(input, dim) and torch.min(input, dim) to ONNX

* .
2018-04-03 15:29:11 -04:00
83926393d3 Detect re-initialization of _C shared library (#6232)
We had a bug in the Buck build of PyTorch due to symbols from _C
being present in two shared libraries that were both loaded at
runtime. This caused global variables to be initialized twice and
destructed twice on exit. The second destruction often caused
segfaults on exit.

This attempts to detect that sort of situation early on. If
Module.cpp is compiled twice, the symbol
pytorch_duplicate_guard()::initialized will be shared. The second
initialization will print an error message and abort.
2018-04-03 15:28:37 -04:00
80ff36c9a4 Print the diff files to aid in debugging when it's wrong. (#6238)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-03 15:12:58 -04:00
581d74f8d0 Remove unused variable in Layout.cpp. (#6236) 2018-04-03 14:26:46 -04:00
4a9e02fc2f Reduce flakiness of math tests in test_torch.py (#6200)
This compares the torch function against the reference math funciton
against a relative small set of inputs, including integers, extremes
of some common functions, zero, a few numbers from randn and a few
numbers near 1e6.

The idea here is not to be completely exhaustive, but rather quickly
expose the most common bugs. For exhaustive checks, we should evaluate
torch functions against all ~4e9 possible float32 value.

We compare the torch function evaluated against contiguous
and non-contiguous inputs and large vs. small tensors.

Also:

  - Make torch.allclose work with nan and +/-inf
  - Add torch.isclose (like numpy.isclose)
  - Add torch.testing.assert_allclose (like
    numpy.testing.assert_allclose)
2018-04-03 13:51:47 -04:00
1b41d7ac1e avx_mathfun.h is imprecise (#6192)
After discussion with @colesbury it turns out that avx_mathfun.h is imprecise and cannot be trusted blindly.

Turns on /fp:strict in Windows to disable replacement of trig functions with imprecise vectorized implementation.
2018-04-03 12:16:57 -04:00
e831ad6204 Fix sharing of empty tensor in multiprocessing (#6229)
Fixes #5719

Previously, the following would error out with an "Invalid file
descriptor" error:
```
import torch
import torch.multiprocessing as mp

q = mp.Queue()
t = torch.tensor([])
q.put(t)
```
on some OSes. The problem was that because one cannot mmap data of size
0, and that an empty tensor has a storage of size 0, the file descriptor
for the storage (referencing shared memory) was not being set. The
multiprocessing sharing code then calls DupFD on that uninitialized file
descriptor, leading to an error.

This PR special cases sharing an empty tensor on the CPU. CUDA does not
have this problem.

Unit tests for both cpu and cuda empty tensors
2018-04-03 11:49:40 -04:00
460e8cd376 change print to logger.warning in operator traceback code (#6216) 2018-04-03 08:01:25 -07:00
4375dfd0b2 Changes without protoc conditions (#6142) 2018-04-03 09:50:14 -04:00
80cf134aff Adjust the setup script according to the repo changes (#6218) 2018-04-03 09:44:52 -04:00
4f1eb06989 Delete dead codes (#6226) 2018-04-03 09:38:48 -04:00
2e156f3eab [caffe2] Add default values to speed_benchmark args (#6210) 2018-04-02 22:00:21 -07:00
fd2e7cb487 Change JobRunner's __call__ function to train (#6205) 2018-04-02 21:04:36 -07:00
9f49be51ec Fix argument checking for inlining a module (#6207) 2018-04-02 23:14:04 -04:00
771fcb3455 [caffe2] Fbcode to GitHub sync (#6208)
* [easy] allow empty tensor in cuda relu op

The diff has not enabled unit test of empty tensor, because MLKVersion of ReluOp need extra work to support

* Make blob norm plotting work with distributed trainer when the old framework is used
2018-04-02 16:35:27 -07:00
fe89e21b02 Add a missed parenthesis to the LogSigmoid documentation (#6209)
Add a missed parenthesis to the LogSigmoid documentation
2018-04-02 18:47:21 -04:00
26c022b183 Documentation for reentrant backwards. (#6191)
* Documentation for reentrant backwards.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR
2018-04-02 18:26:15 -04:00
ad34d88959 added word object to function doc string for clarity (#6204) 2018-04-02 18:22:01 -04:00
a409f959e8 Remove ShuffleNet from model zoo. (#6203)
* No longer supported.
2018-04-02 15:00:06 -07:00
4c81282c33 Introduce torch.layout and split layout from dtypes. (#6145)
* Introduce torch.layout and split layout from dtypes.

Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'.

Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case
(i.e. specifying a type in a factory function).  But this doesn't really follow for sparity, which isn't a common case.

It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the
last dimension of an n-d array).  But this should be the same whether or not the tensor is represented via strides, sparsity, etc.

This is accomplished by:
1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both
   torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype
2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch.

* Formatting, make init throw python_error.

* Fix cuda not enabled error message.

* Fix test.
2018-04-02 14:07:50 -04:00
28e66705ff Move helper scripts to new repo (#6159) 2018-04-02 14:06:29 -04:00
63af898d46 Fix extension test on Windows (#5548)
* Change cpp_extensions.py to make it work on Windows

* Fix linting

* Show python paths

* Debug

* Debug 1

* set PYTHONPATH

* Add ATen into library

* expose essential libs and functions, and copy _C.lib

* Specify dir in header

* Update check_abi for MSVC

* Activate cl environment to compile cpp extensions

* change version string

* Redirect stderr to stdout

* Add monkey patch for windows

* Remove unnecessary self

* Fix various issues

* Append necessary flags

* add /MD flag to cuda

* Install ninja

* Use THP_API instead of THP_CLASS

* Beautify the paths

* Revert "Use THP_API instead of THP_CLASS"

This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d.

* Use THP_API instead of THP_CLASS(new)
2018-04-02 13:53:25 -04:00
605307f8f3 Add support for printing extra information in Module and refactor redundant codes (#5936)
This PR enables users to print extra information of their subclassed nn.Module.
Now I simply insert the user-defined string at the ending of module name, which should be discussed in this PR.

Before this PR, users should redefine the __repr__ and copy&paste the source code from Module.

* Add support for extra information on Module

* Rewrite the repr method of Module

* Fix flake8

* Change the __repr__ to get_extra_repr in Linear

* Fix extra new-line for empty line

* Add test for __repr__ method

* Fix bug of block string indent

* Add indent for multi-line repr test.

* Address review comments

* Update tutorial for creating nn.Module

* Fix flake8, add extra_repr of bilinear

* Refactor DropoutNd

* Change to extra_repr in some Modules

* Fix flake8

* Refactor padding modules

* Refactor pooling module

* Fix typo

* Change to extra_repr

* Fix bug for GroupNorm

* Fix bug for LayerNorm
2018-04-02 13:52:33 -04:00
7355f5cd8d Tell source users about TORCH_CUDA_ARCH_LIST (#6185)
Put it into the comments about env vars in setup.py.
Also put in a line in the README about where to find this info.
2018-04-02 13:35:14 -04:00
4748c9b529 Fix logic inside insertInput (#6146)
* Fix logic inside insertInput

* Add comment

* Commentary
2018-04-02 13:20:35 -04:00
92a0f7835e Support returning dictionaries in DataParallel (#6113) 2018-04-02 15:16:44 +02:00
0b17f4b87e [distributions] Support python floats in AffineTransform (#6035)
This avoids promotion from python float to torch.Tensor for AffineTransform. This appears to be needed so that constraint registration works across CPU and all GPUs.

Previous discussion at 3a25db73c8 (r176361909)

Background:

There are three basic types of objects in torch.distributions:

- Distributions are flyweight objects constructed from tensor or float args. They always promote float args to tensors.
- Transforms are longer-lived objects (sometimes cached; some are static globals). They can take float arguments. This PR makes AffineTransform avoid promoting float args to tensors.
- Constraints are long-lived objects. They can take either float or tensor arguments. They do not promote floats to tensors. These are relatively symbolic and are not much more than partially evaluated comparisons, e.g. constraints.positive is basically a symbolic version of lambda x: x > 0 that can be stored in a ConstraintRegistry table.

The Problem:

Sometimes we want to apply transform_to(constraints.positive) to a torch.Cuda.FloatTensor. This is fine since

transform_to(constraints.positive)(x)
    = ExpTransform()(x)
    = x.exp()

which works with any tensor type.

Other times we want to apply transform_to(constraints.greater_than(1.5)) to a torch.cuda.FloatTensor. This is problematic before this PR since

transform_to(constraints.greater_than(1.5))(x)
    = ComposeTransform([ExpTransform(), AffineTransform(1.5, 1)])(x)
    = AffineTransform(1.5, 1)(x.exp())
    = t.loc + t.scale * x.exp()  # where t = AffineTransform(1.5, 1)

Before this PR, AffineTransform would promote t.loc and t.scale to tensors. This promotion can happen as early as library load time for some transforms, e.g. transform_to(constraints.unit_interval). Therefore before this PR, the second example would error at t.scale * x.exp() because t.scale is a [default] torch.FloatTensor whereas x.exp() is a torch.cuda.FloatTensor.

Proposed solution:
This PR merely adds support for python floats as the .loc and .scale parameters of AffineTransform. This should suffice for most purposes since only AffineTransform and a handful of parameter-free transforms are ever stored in the global transform_to and biject_to registries.

Alternative solutions include:
- allowing promotion from torch.FloatTensor to all other tensor types, e.g. torch.cuda.FloatTensor.
- adding a handful of specific parameter-free transforms like NegateTransform() in lieu of AffineTransform(0, -1).

Tested: added a regression test

* Support python floats in AffineTransform

* Update docstrings
2018-04-01 23:56:34 -04:00
8617d3f1eb Refine dirty matching for docs. (#6177)
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-04-01 21:18:33 -04:00
d93d41b2ef Some notes about PyTorch/Caffe2 merge. (#6147)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-31 11:33:01 -07:00
9ce21b0e90 Delete NNPACK (#6151)
Since we added cpuinfo as a vendored dependency, this created a problem
with our NNPACK integration, because NNPACK also depends on cpuinfo,
as per #6068.  This is particularly difficult to resolve because we
depend on a fairly recent version of cpuinfo, which we generally cannot
assume users have installed (it is submoduled.)  So, it would seem that
to fix this properly, NNPACK would have to be vendored and built against
the correct cpuinfo.

However, discussion with Christian Puhrsch and Marat Dukhan suggests
that the benefit of carrying on with NNPACK integration is not all that
great, because mkldnn has since come out with a CPU convolution implementation
that performs better than NNPACK.  NNPACK's x86 implementation is not
really maintained, and its ARM support is not really relevant to PyTorch.

So rather than go through all the rigamarole of vendoring NNPACK, better
to just delete it.  If you need good perf for CPU convolutions, please
make sure you build against mkldnn.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-31 11:28:31 -07:00
da6c3c90d9 Relax constraints on return statements in the script (#6070)
Script functions can now have no return statements, empty
return statements, or return one or more values.

Additionally fix the lexer to always emit TK_NEWLINE before
TK_DEDENT, which simplifies the parser.
2018-03-31 18:35:33 +02:00
32ba2ca203 add documentation for diagflat and diagonal (#6161) 2018-03-31 18:03:21 +02:00
7e1046ce83 Fix SparseMM compiler warning (#6156)
```
[6/179] Building NVCC (Device) object
src/ATen/CMakeFiles/ATen.dir/native/cuda/ATen_generated_SparseMM.cu.o
/home/rzou/pytorch/aten/src/ATen/native/cuda/SparseMM.cu(9): warning:
statement is unreachable

/home/rzou/pytorch/aten/src/ATen/native/cuda/SparseMM.cu(9): warning:
statement is unreachable
```
Warning was caused by unnecessary return statement.
2018-03-31 16:40:01 +02:00
de42542351 Make precision matrix computation in mvn stable (#6128) 2018-03-31 16:39:33 +02:00
0d19b81a65 Give ATen errors backtraces (#6112) 2018-03-31 16:39:12 +02:00
cbe92abd7c Disable failing test_lengths_max_gpu 2018-03-30 21:00:45 -07:00
e0633ef1f1 Fix Windows build of nomnigraph and remove header. 2018-03-30 21:00:45 -07:00
acea18a54a Fix net_test ParseFromString usage. 2018-03-30 21:00:44 -07:00
3d27095eec [easy] fix comments
nit: fix comments
2018-03-30 21:00:44 -07:00
365652229d Back out "Revert D7372460: [DT] [28/n] Lift epoch_limiter"
Original commit changeset: b0a986d16c3b
2018-03-30 21:00:44 -07:00
b9d2ba1dbf Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid
This reverts commit d63266ccbc0c1390c58c2a71ae0b562fdec2fbc0

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
2018-03-30 21:00:44 -07:00
363a227d19 extend bucketize op to support duplicated boundries
upgrade bucketize op to support duplicated boundaries
2018-03-30 21:00:44 -07:00
551d5fbf9a CUDA version of LengthsMax operator
CUDA version of LengthsMax operator

@override-unit-failures
2018-03-30 21:00:44 -07:00
0df662c67f [Caffe2] [Int8] More exhaustive unit tests for int8 ops (+ bug fix in Int8Add in-place case)
As title. This catches one bug in the Int8Add in-place case,
which wasn't tested in int8_test.cc
2018-03-30 21:00:44 -07:00
2b0e39f569 [GanH]: Log D Trick for Cross Entropy with Sigmoid
as titled
2018-03-30 21:00:44 -07:00
f8eb8a66e2 Revert D7372460: [DT] [28/n] Lift epoch_limiter
This reverts commit 05bd9bec10fad5ff9dc40be88836fd7274d50ce9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
2018-03-30 21:00:44 -07:00
58ae29b702 Fix schema check for arg_ops
Fix schema check for arg_ops.
2018-03-30 21:00:44 -07:00
ee64200c64 [nomnigraph] Expose transformations to python
Adding a python interface to the transformations
2018-03-30 21:00:44 -07:00
028a598cb9 Expose thread pool to operators
Adding ExecutorHelper interface between executor and operators
2018-03-30 21:00:44 -07:00
77976d34f4 Respect num_workers parameter in async net executor
Making sure we honor num_workers parameter in async executor
2018-03-30 21:00:44 -07:00
03c5198331 [C2 Int8][C2 Core]fetch int8 blob
Providing Python API to fetch Int8 tensors.

  data, scale. zero_point = workspace.FetchInt8Blob(blob_name)

now returns a tuple if the blob contains a Int8TensorCPU

     'data' = int8 data array
     'scale' = fake quantization scale
     'zero_point' = fake quantization offset

Although FetchBlob shares back-end implmentation with FetchInt8Blob, we raise
error to prevent unexpected behavior of the same method
2018-03-30 21:00:44 -07:00
8f3ba30266 Fix a typo
Fix a typo in optimize_onnx_test.py
2018-03-30 21:00:44 -07:00
c9dbfca275 bugfix im2col op
fixes grad bug in im2col op
2018-03-30 21:00:44 -07:00
bb04053e22 Fixing TTSN unit tests
got lost in rebase
2018-03-30 21:00:44 -07:00
91162a74ed [easy] Improving error message
`_EQ` variation prints the values in case of failure; make it easier to debug
2018-03-30 21:00:44 -07:00
4cb79ee8e1 [codemod][caffe2] comment out unused parameters
The changes in this diff comments out unused parameters. All changes are automated using clang-tidy.
This will allow us to enable `-Wunused-parameter` as error.
#accept2ship
2018-03-30 21:00:44 -07:00
e13c6fee66 [PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot
Same as D7281311 but with DotProduct TensorInference removed
2018-03-30 21:00:44 -07:00
0ac4d19a29 Linter changes. 2018-03-30 21:00:44 -07:00
85c9b89edf Back out "[PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot"
Original commit changeset: 48fccd71a270
2018-03-30 21:00:44 -07:00
02786a3819 Linter changes. 2018-03-30 21:00:44 -07:00
c2703aa141 Renaming .jenkins testing folder to caffe2_test (#6148)
* Renaming .jenkins testing folder to caffe2_test

* Another fix
2018-03-30 19:38:47 -04:00
4563e190c4 Use THC cached CUDA device property when get_device_name and get_device_capability (#6027)
Getting CUDA device property struct with cudaGetDeviceProperties is expensive. THC caches CUDA device property, which is available via THCState_getDeviceProperties, which is available via at::globalContext().getDeviceProperties(device), which is available via torch.cuda.get_device_properties. This PR changes the two methods that previously calls cudaGetDeviceProperties to directly using torch.cuda.get_device_properties in Python.

Also fixes ATen compile error when it can't find CUDA.

Fixes #4908. Using the script from that issue, we get roughly 18x speed-up.

[ssnl@ ~] python dev.py  # master
0.2826697587966919
0.00034999847412109375
0.0003493785858154297
0.000356292724609375
0.00036025047302246094
0.0003629922866821289
0.00036084651947021484
0.00035686492919921874
0.00036056041717529296
0.0003606319427490234
[ssnl@ ~] python dev.py  # this PR
0.27275662422180175
2.1147727966308594e-05
1.9598007202148438e-05
1.94549560546875e-05
1.9359588623046876e-05
1.938343048095703e-05
2.0074844360351563e-05
1.952648162841797e-05
1.9311904907226562e-05
1.938343048095703e-05
2018-03-30 16:39:22 -04:00
1449c9f754 Update autograd docs (#5907)
* Update autograd docs

* Deprecate 'grad_variables' in backward().

Advise to replace with 'grad_tensors'.

* Resolve saved_variables/saved_tensors

* Tensor section

* Address comments

* Address comments

* Address comments
2018-03-30 15:33:11 -04:00
5fe3c406f2 Experimental support for different ONNX export types (#6016)
Allows you to export an ONNX model as:

Protobuf file (this is what we have now)
Uncompressed zip archive
Compressed zip archive
Directory

* Experimental support for different ONNX export types

* Remove a copy

* Add comment

* Add test cases

* lint

* fix bug

* address comments
2018-03-30 15:30:38 -04:00
d2c0f8bb57 avoid generating torch.*_backward_(input|weight|bias) (#6114) 2018-03-30 15:23:56 -04:00
3d3b62e2d6 Add REL_WITH_DEB_INFO build mode (#6122)
Small PR to allow use of RelWithDebInfo mode in CMake as per request from @ebetica, to make debugging in optimized binaries easier (i.e. don't have to suffer major decrease in performance when using DEBUG mode, but can still debug properly, not like in RELEASE mode).

From what I can see using RelWithDebInfo means -O2 -g -DNDEBUG while Release means -O3.

normal (release):

$ python setup.py build develop
$ grep -e' -fexceptions ' aten/build/build.ninja
FLAGS = -DUSE_AVX2 -msse3 -DUSE_SSE3 --std=c++11 -Wall -Wno-unknown-pragmas -Wno-vla -fexceptions  -fopenmp -O3
This PR allows use of the REL_WITH_DEB_INFO environment variable:

$ REL_WITH_DEB_INFO=1 python setup.py build develop
$ grep -e' -fexceptions ' aten/build/build.ninja
FLAGS = -DUSE_AVX2   -DUSE_SSE3 --std=c++11 -Wall -Wno-unknown-pragmas -Wno-vla -fexceptions  -O2 -g -DNDEBUG

* Add REL_WITH_DEB_INFO mode

* Fix batch file syntax
2018-03-30 15:18:48 -04:00
eb8a43a272 Fix setup_caffe2.py lint error. (#6143) 2018-03-30 14:08:50 -04:00
93efe22d72 PyTorch does not use top level cmake.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-30 10:56:40 -07:00
a2a28c0ef1 tox.ini update.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-30 10:33:08 -07:00
37044d7515 Add 'dirty diff' tests for PyTorch and Caffe2.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-30 10:32:25 -07:00
90afedb6e2 Merge caffe2 with pytorch. 2018-03-30 10:29:50 -07:00
1e9a16c3d1 Fix typo in NLLLoss docs (#6134) 2018-03-30 10:01:02 -07:00
48ad4546d2 Move LayerNorm to ATen; remove tracking_running_stats functionality (#5983)
* move LN to aten; remove tracking_stats functionaility

* Address comments about error message and respect cudnn flag for LayerNorm and GroupNorm
2018-03-30 09:44:11 -07:00
bc1b4c8912 ByteTensor sum test (#6042) 2018-03-30 10:58:38 -04:00
eca84e2532 Rename setup.py to setup_caffe2.py (#2483)
* Rename setup.py to setup_caffe2.py
* Also move VERSION_NUMBER under caffe2/ directory.
* Our setup*.py file needs to be at the root level.
* Add requirements.txt
2018-03-30 07:29:55 -07:00
3aca8f3b40 adding const fxn modifier to Operator::type() (#2484) 2018-03-30 01:05:21 -07:00
60a16e5663 Set dataloader.batch_size = None when batch_sampler is given (#6108) 2018-03-30 10:01:09 +02:00
4da3fa5095 strip some python dependencies (#2486)
* strip some python dependencies

* remove matplotlib as well

* remove pydot which is only used by net_drawer
2018-03-29 21:49:33 -07:00
47a1fd208f Quick and dirty raw value substitution from zip file (#2454) 2018-03-29 19:18:58 -07:00
f8270c0225 Enable MKLDNN convolution forward and backward (#6062)
* Enable MKLDNN convolution forward and backward

* minor change

* fix mkldnn build error when building ATen standalone
2018-03-29 15:25:07 -07:00
e4c0bb1809 Speed up sum over a dimension (#6026)
Perf numbers:
https://gist.github.com/colesbury/9e28dd7b0f27b0b019f68adbd4bd4b88

I've changed the dispatch stub so that it doesn't require every kernel
to be compiled for every instruction set. Kernel implementations are
stored in the stub's table with the REGISTER_DISPATCH macro.

I've also moved vec256 to it's own folder and split up the
specializations before they get too unwieldy.

Change UnaryOpsKernel to use new DisaptchStub

 - Prefer signed integers. Mixing signed and unsigned integers is a
   pain and ATen mostly uses signed integers (int64_t).
 - Use inline lambda instead of struct for UnaryOps
 - Rename partial load overload "load_partial"
2018-03-29 18:13:43 -04:00
3dffac91bc Fixed some tests by using the correct optimizer (#6116) 2018-03-29 23:19:00 +02:00
2ed2624c28 Move README.md to caffe2/ in prep for merge. (#2479) 2018-03-29 14:09:32 -07:00
d42fcdbc96 Add source location information to error messages (#6059) 2018-03-29 22:57:18 +02:00
7ffcb20295 small math cleanups in the docs (#6057) 2018-03-29 22:50:08 +02:00
29c389078b RNN num_layers and dropout docs and checks (#6079) 2018-03-29 22:44:27 +02:00
53bca3302d Add CPU perf test for torch.* and torch.Tensor.* (#6054) 2018-03-29 14:51:07 -04:00
df8991b1b7 [auto] Update onnx to 1d7dee4 - Fix Average pool test cases converted from PyTorch (#677)
1d7dee4e21
2018-03-29 17:27:21 +00:00
bb114bc05d Update FFT comments from #5856 (#6089) 2018-03-29 13:27:03 -04:00
4f05cb710e Add underscore to nn.init.* and deprecate the original ones (#6093)
Fixes #5946.

* add underscore to nn.init.* and deprecate the original ones

* add a test for deprecation
2018-03-29 13:26:12 -04:00
21aba57744 Fix a bug in ONNX symbolic of average 3d pooling op (#6101) 2018-03-29 13:25:18 -04:00
f5d0d947c1 Exp, log, sin, cos vectorized (#6078)
Measured perf using the this script:

https://paste.fedoraproject.org/paste/yJiXU3AZGHuyjTVRWlj5OQ
2018-03-29 13:24:44 -04:00
368f96acde Remove tutorials from main repository.
* They now live at https://github.com/caffe2/tutorials
* Updating caffe2.ai website to match in a separate commit.
2018-03-29 09:31:09 -07:00
16b0adb274 Remove top-level cmake directory. (#6085)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-29 11:51:09 -04:00
b21e135ab8 Add class-specific error when key mismatch in load_state_dict (#6086) 2018-03-29 12:22:23 +02:00
bb3bfa09f3 Avoid some string copies when creating operators (#2475) 2018-03-28 22:31:49 -07:00
df039e2998 Unify handling of type_dispatched_args in gen_python_functions. (#6088)
This is just to simplify the handling, there is no generated code difference.
2018-03-28 22:23:20 -04:00
a90aa5d818 Fix small typo in setup.py (#6091)
Fixed small typo in setup.py
2018-03-28 16:51:08 -07:00
ba0f18a9d7 Delete defunct .travis files, and move release-notes.md to caffe2 dir. (#2472)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-28 15:54:32 -07:00
ecd5de0f36 [fft][2 of 3] Forward for fft methods (#5856)
* implement fft ifft rfft irfft

* add tests for fft ifft rfft irfft
2018-03-28 18:44:29 -04:00
6ae0576e1c Remove dtypes from legacy tensor.new(...) (#6081)
This is in preparation for splitting out sparsity (layout) from dtypes; it's complex to maintain these
and tensor.new(...) is a legacy API in any case.
2018-03-28 18:37:21 -04:00
371e14b807 NLLLoss: error message for mismatched input/target batch sizes (#6072)
Fixes #5554

Adds an error message for when NLLLoss is passed an input and target
whose batch sizes don't match. Ideally this check should live in ATen
but since there is NLLLoss logic in python the check is there right now.
2018-03-28 14:21:38 -07:00
127cdc324d Fetch master commit log before perf test (#6077) 2018-03-28 14:19:13 -07:00
2f602dce1d Correct argument misspelling. (#6076) 2018-03-28 15:20:40 -04:00
1807bacd65 Fix printing of unknown binop operator in torchscript (#6069)
Before, using an unknown binary operator like `@`:
```
import torch
@torch.jit.script
def mm(x, y):
    return x @ y

x = torch.randn(4, 3)
y = torch.randn(3, 2)
mm(x, y)
```
resulted in [this not-so-readable trace](https://gist.github.com/zou3519/052b8998108c4bc0fe0e7c85c6f5758e).

Now, it tells the user that the problem is an unknown binary operator:
```
NotSupportedError: unsupported binary operator: MatMult
@torch.jit.script
def mm(x, y):
    return x @ y
            ~~~ <--- HERE
```
2018-03-28 19:41:45 +02:00
a014a7cd37 Link protobuf public in the standard case 2018-03-28 10:05:20 -07:00
e881efde79 Use local FindCUDA for CMake < 3.7 2018-03-28 10:05:20 -07:00
3a84574c81 Update CAFFE2_LINK_LOCAL_PROTOBUF functionality.
* Continuation of https://github.com/caffe2/caffe2/pull/2306 and based on Yangqing's PR at https://github.com/caffe2/caffe2/pull/2326
* Put caffe2_protos as static library and link it whole to libcaffe2.so
* For protobuf::libprotobuf, only link it to libcaffe2_protos (and hence libcaffe2.so), but not any downstream library. This avoids manipulating protobuf objects across dll boundaries.
* After the above, during linking one will receive complaint that fixed_address_empty_string is not found. This is because we compiled protobuf with hidden visibility, and the fact that the generated caffe2.pb.h has an inline function that invokes the inline function in protobuf GetEmptyStringAlreadyInited()
* Added sed-like commands to replace the generated header to use caffe2::GetEmptyStringAlreadyInited() instead. And, in proto_utils.cc, implement a function that essentially routes the function call to protobuf's internal one. The reason this works is that, caffe2::G... is visible globally, and libcaffe2.so is able to see the real protobuf one. This ensures that we are always calling protobuf functions that are inside libcaffe2.so.
2018-03-28 10:05:20 -07:00
dbac044759 Add protobuf wrapper functions to proto_utils.
* These will be used when we statically link libprotobuf.a inside libcaffe2.so
2018-03-28 10:05:20 -07:00
b752f4cdda Fix instance norm (#6023) 2018-03-28 11:21:16 -04:00
64e2c03bea Enable TensorDataset to get any number of tensors (#6038)
Keeping compatibility, enable TensorDataset to get any number of tensors.

* Enable TensorDataset to get any number of tensors

* Update dataset.py

Fix syntax error on python 2.7

* Add several test for tensordataset

* Fix whitespaces

* Simplify args

* Update dataset.py
2018-03-28 11:20:50 -04:00
bc7fb1d6d8 Update cpuinfo to d0222b47948234cc01983243a2e0ede018f97f3a (#6043) 2018-03-28 11:19:37 -04:00
7f66164a89 Delete defunct .travis.yml and appveyor.yml files (#2429)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-28 07:55:03 -07:00
31c0e2321a Block set from param_group['params'] (#6031)
* Block set from param_group['params']

This might cause `list(params)` to output in random order. In this case, in `load_state_dict()`, `id_map` would not be matched correctly.

* Update Error Message

* Add Warning on Optimizer Docs

* Update optimizer.py
2018-03-28 07:45:19 -07:00
063946d2b3 Added parameter range checks for all optimizers (#6000) 2018-03-28 11:22:23 +02:00
ae4362bc6a Fix memory leak when using multiple workers on Windows (#5585) 2018-03-28 10:35:28 +02:00
8964aab260 fix docs error in torch.nn.functional.nll_loss (#6060)
According to the code in _torch/nn/functional.py:1399_
(```if target.size()[1:] != input.size()[2:]:```),
if the size of input is (N, C, d_1, d_2, ..., d_K), the size of target should be (N, d_1, d_2, ..., d_K).
2018-03-28 10:05:14 +02:00
e114e84d91 [auto] Update onnx to 36d7fff - Fix Attribute default value pybind11 binding (#671)
36d7fffaf3
2018-03-28 06:58:24 +00:00
0c7c34253b [auto] Update onnx to 0536866 - git ignore .pytest_cache (#674)
053686607d
2018-03-28 06:54:44 +00:00
0f198fa723 Add additional script module functionality (#6033)
* allow calls to non-script methods, allow python non-script attributes in methods
* add test to make sure submodules are not reassigned
* Test that we can change python attributes
2018-03-27 23:37:56 -07:00
d3f92eebee Remove redundant code (#2460) 2018-03-27 22:44:39 -07:00
86e285d0e0 [auto] Update onnx to afc84ac - Update README.md (#672)
afc84aca45
2018-03-28 03:52:00 +00:00
eb18a2f26c Reorganize third-party libraries into top-level third_party directory (#6025)
- gloo, pybind11, nanopb and nccl now live in third_party.
- ATen builds in aten/build rather than torch/lib/build/aten
- A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools
- Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly
2018-03-27 22:09:20 -04:00
02d5ae6c9b Removing verbose logging from windows (#2455) 2018-03-27 18:19:55 -07:00
344fa57680 Adjust the test since only the op only has CPU implementation 2018-03-27 18:10:39 -07:00
8b434d1141 Quick fix on the observer test 2018-03-27 18:10:39 -07:00
6412adcef3 Move the stump op to oss 2018-03-27 18:10:39 -07:00
0ac8495165 Fix the CMake issues caused by internal changes 2018-03-27 18:10:39 -07:00
af3dcdf6ae [D2]: Improve loss weight by allowing omitted weights
as titled
2018-03-27 18:10:39 -07:00
d6c30ee6af [GanH]: Unifying two discriminators
to improve the flexibility and combines different discriminators in one model.
2018-03-27 18:10:39 -07:00
3300e21d52 Add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather
add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather
2018-03-27 18:10:39 -07:00
e6b04ba121 fix lengths sum cuda op for empty batch
the cuda does not allow launching empty kernel
2018-03-27 18:10:39 -07:00
6ed9a0c3f2 fix cuda elementwise ops for empty batch
CUDA will fail to launch empty kernel
2018-03-27 18:10:39 -07:00
c6587597d8 Ignore backward step when there is no loss function;
Ignore backward step when there is no loss function;

For some customized model, we can encode the update directly in forward step and there is no backward step;
2018-03-27 18:10:39 -07:00
c909abd85f [GanH] Label Smooth: Add Layer and Integrate to SparseNN
as titled
2018-03-27 18:10:39 -07:00
107cb670b1 add typecast and assertion for histogram computing
as title
2018-03-27 18:10:39 -07:00
26fbfa959e Integrate fbgemm fp16 with Caffe2
Added C2 operators and python test
Added transformation from FC to FBPackedFC and unit test
2018-03-27 18:10:39 -07:00
078b6d5ad1 [layer model] remove duplicated init ops
it saves some model init time, and reduce confusion.
2018-03-27 18:10:39 -07:00
d5e38a8aee [PerfModel] Add Profile observer
Adds profile observer to system.  This outputs the following information
1) Input tensor sizes
2) Argument list
3) Output tensor sizes
4) Operator run time

Example output:
  I0206 14:00:51.217067 1730559 profile_observer_gpu.cc:53] --------- Starting operator Conv op#0 ---------
  I0206 14:00:51.217073 1730559 profile_observer_gpu.cc:65] Input 0: Tensor gpu_0/data of type float. Dims: (32,3,227,227,):
  I0206 14:00:51.217077 1730559 profile_observer_gpu.cc:65] Input 1: Tensor gpu_0/conv1_w of type float. Dims: (64,3,7,7,):
  I0206 14:00:51.217082 1730559 profile_observer_gpu.cc:71] Argument 0: name: "kernel" i: 7
  I0206 14:00:51.217087 1730559 profile_observer_gpu.cc:71] Argument 1: name: "enable_tensor_core" i: 0
  I0206 14:00:51.217089 1730559 profile_observer_gpu.cc:71] Argument 2: name: "exhaustive_search" i: 1
  I0206 14:00:51.217092 1730559 profile_observer_gpu.cc:71] Argument 3: name: "float16_compute" i: 0
  I0206 14:00:51.217095 1730559 profile_observer_gpu.cc:71] Argument 4: name: "stride" i: 2
  I0206 14:00:51.217099 1730559 profile_observer_gpu.cc:71] Argument 5: name: "pad" i: 3
  I0206 14:00:51.217103 1730559 profile_observer_gpu.cc:71] Argument 6: name: "order" s: "NCHW"
  I0206 14:00:51.217105 1730559 profile_observer_gpu.cc:71] Argument 7: name: "ws_nbytes_limit" i: 67108864
  I0206 14:00:51.217109 1730559 profile_observer_gpu.cc:85] Output 0: Tensor gpu_0/conv1 of type float. Dims: (32,64,114,114,):
  I0206 14:00:51.217111 1730559 profile_observer_gpu.cc:88] --------- Finished operator Conv in 1.12685 ms ---------

Example output for internal RNN op (from seq2seq):
  I0219 18:57:06.779331 2960991 profile_observer_gpu.cc:52] --------- Starting operator LSTMUnit op#3161697160-7 ---------
  I0219 18:57:06.779336 2960991 profile_observer_gpu.cc:59] Input 0: Tensor model0/encoder/layer3/lstm/hidden_t_prev of type float. Dims: (1,1,512,):
  I0219 18:57:06.779340 2960991 profile_observer_gpu.cc:59] Input 1: Tensor model0/encoder/layer3/lstm/cell_t_prev of type float. Dims: (1,1,512,):
  I0219 18:57:06.779343 2960991 profile_observer_gpu.cc:59] Input 2: Tensor model0/encoder/layer3/lstm/gates_t of type float. Dims: (1,1,2048,):
  I0219 18:57:06.779346 2960991 profile_observer_gpu.cc:59] Input 3: Tensor encoder_lengths of type int. Dims: (1,):
  I0219 18:57:06.779350 2960991 profile_observer_gpu.cc:59] Input 4: Tensor timestep_rnnexec_t24 of type int. Dims: (1,):
  I0219 18:57:06.779353 2960991 profile_observer_gpu.cc:70] Argument 0: name: "no_sequence_lengths" i: 0
  I0219 18:57:06.779357 2960991 profile_observer_gpu.cc:70] Argument 1: name: "drop_states" i: 0
  I0219 18:57:06.779362 2960991 profile_observer_gpu.cc:70] Argument 2: name: "forget_bias" f: 0
  I0219 18:57:06.779366 2960991 profile_observer_gpu.cc:79] Output 0: Tensor model0/encoder/layer3/lstm/hidden_t of type float. Dims: (1,1,512,):
  I0219 18:57:06.779369 2960991 profile_observer_gpu.cc:79] Output 1: Tensor model0/encoder/layer3/lstm/cell_t of type float. Dims: (1,1,512,):
  I0219 18:57:06.779372 2960991 profile_observer_gpu.cc:89] RecurrentNetwork 3161697160: order: 7
  I0219 18:57:06.779373 2960991 profile_observer_gpu.cc:92] --------- Finished operator LSTMUnit in 0.00153923 ms ---------

Existing deficiencies:
1) Need support to create separate CPU and GPU builds

Once this is approved, I'll port the changes over to OSS
2018-03-27 18:10:39 -07:00
677c8d6769 [PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot
Same as D7281311 but with DotProduct TensorInference removed
2018-03-27 18:10:39 -07:00
d2453afb1e Add SumElementsInt operator
Added a caffe2 math sum operator so that it takes integers (only int32)
Changed the SumFloatIter to SumGenericIter so that it takes >1 types.
Added a sumElementInt operator
2018-03-27 18:10:39 -07:00
a0a136117c Faster positive modulo in IndexHashOp
Change the positive modulo computation to use less modulo. This should
run ~2x faster (just the modulo part). In addition, we should later switch to
compute reciprocal modulo.
2018-03-27 18:10:39 -07:00
16312e8123 [fbtranslate/onnx] decoder step (pytorch -> caffe2) exporter for fbtranlsate
This code introduces a new class for exporting decoder step (ensemble) models trained with fbtranslate pytorch to Caffe2 models via ONNX, for the purpose of use in "component beam search" being developed concurrently in C++ by @juancarabina.
2018-03-27 18:10:39 -07:00
60d6ecd90f Back out "[PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot"
Original commit changeset: 48fccd71a270
2018-03-27 18:10:39 -07:00
0a4f146228 Codemod imports from libfb to use full path /caffe2
Codemoding imports from libfb.py of the format "from libfb import X". This is part of a larger codemod to remove the mapping from libfb/py to libfb, in the interest of enabling static typechecking in fbcode.
2018-03-27 18:10:39 -07:00
a92a6233b5 Enable support for placeholder ops in InjectCrossDeviceCopies
This is required to support placeholder/decorator ops which does not have operator schema. Note that the change is made in such a way that it is a no-op if placeholder Ops are not used.

Changes:
1. Since the placeholder ops always run on CPU, added a utility to infer placeholder ops blob devices.
2. Placeholder op's input/output blobs should be on CPU as well. This change takes care of dealing with output blobs - i.e. use blobs on CPU.
3. Added a Unit test - test_inject_copy_placeholder_ops
2018-03-27 18:10:39 -07:00
84605438f2 [PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot 2018-03-27 18:10:39 -07:00
8baa563daf Change observer copy() method to take id parameter
This diff is added to support the ProfileObserver in order to differentiate operators in the stepnet properly.  Since copy() is only used in the context of RNNs, the name has been changed to reflect that.
2018-03-27 18:10:39 -07:00
e977825c01 Merge the conflicts 2018-03-27 18:10:39 -07:00
bde2f6b298 ATen Unary Ops (#6030)
Implements a few unary operations for which there are AVX intrinsics.

The perf comparison script is here:
https://paste.fedoraproject.org/paste/f1adcJhpGtzDNWImS34XzQ
2018-03-27 20:39:28 -04:00
9f3a46c583 Bumping aten to latest commit (#2453) 2018-03-27 16:09:07 -07:00
3c577fccf3 Move Caffe2 Dockerfiles to docker/caffe2 (#2430)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-27 15:51:50 -07:00
b7084e4028 Move .jenkins to .jenkins/caffe2 (#2434)
* Move .jenkins to .jenkins/caffe2

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Bugfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-27 15:49:55 -07:00
da7193b69a Update gloo to PyTorch's version (#2451)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-27 15:42:26 -07:00
0eab63d9dd pybind11 submodule update to PyTorch's version. (#2450)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-27 15:41:55 -07:00
8fa38f8dce Add gradient clipping (#2452)
As titled.
2018-03-27 15:10:15 -07:00
c4e5001af8 [auto] Update onnx to 9d2b530 - Revert "[Typing 1/3] Setup mypy type checker (#607)" (#667)
9d2b5301ac
2018-03-27 21:47:02 +00:00
ebc0194950 Fix use-after-free bug in peephole pass (#6037)
* Fix use after free bug in peephole pass

* Move the loop befor the switch
2018-03-27 17:25:38 -04:00
8054dbd655 Trivial typo (#6053) 2018-03-27 14:21:47 -07:00
b5fa9a82c8 Update Dockerfile build instructions for new layout. (#6051)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-27 14:11:22 -07:00
5583c12888 Fix bias size assert in Bilinear (#5992) 2018-03-27 23:05:04 +02:00
2ad57eeea9 [auto] Update onnx to 086727e - [Typing 1/3] Setup mypy type checker (#607)
086727e5a0
2018-03-27 20:11:54 +00:00
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
2017c9caef Add script for removing Apache header.
* Can be used to remove the license header added with add_apache_header.sh
2018-03-27 13:10:18 -07:00
34f2f48394 Allow larger margin of error for GPU perf test runtime (#6044) 2018-03-27 15:55:59 -04:00
db53389761 Add numpy.array-like type inference to torch.tensor. (#5997)
* Add numpy.array-like type inference to torch.tensor.

* Temporary fix for int/double types.

* Treat python floats as the default (scalar) dtype.

* Also make 0-length sequences the default scalar type and add more tests.

* Add type inference to sparse_coo_tensor.

* Fix sparse test.

* Remove allow_variables.

* Check numpy platform bits.

* Address review comments.

* Make suggested changes to constraints.

* More checking windows builds.

* Fix test for windows.
2018-03-27 15:27:23 -04:00
c89685a115 Make error messages in net_dag more clear 2018-03-27 11:56:49 -07:00
5f90d41211 [auto] Update onnx to 5716e20 - Convert all Node tests to Model tests (#651)
5716e2076b
2018-03-27 18:45:21 +00:00
a3d08de331 Move .jenkins to .jenkins/pytorch (#6004)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-27 10:54:32 -04:00
49f2bb7e0b Extra comment about backward vs. grad in engine. (#6005)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-27 10:54:06 -04:00
47f31cb1e6 Update FAQ to make more sense after tensor/variable merge (#6017) 2018-03-27 07:48:25 -07:00
f393c90cda Moving conda/ to caffe2/conda (#2428)
* Moving conda/ to caffe2/conda

* fix

* Moving caffe2/conda to conda/caffe2
2018-03-27 07:48:23 -07:00
f93e820e7d Revert "[C2][GPU]LengthsMax CUDA version (#2209)" (#2444)
This reverts commit 71acc269bb573c8c04343e6d534b2557a456b29a.
2018-03-27 01:15:52 -07:00
6740126f5c [C2][GPU]LengthsMax CUDA version (#2209)
lengthsmax CUDA version.

will provide gradient later
2018-03-27 00:19:17 -07:00
9e2001683e Move doc generation code into docs/caffe2. (#2435) 2018-03-26 21:07:40 -07:00
0e0918cb9a dpm synchronize 2018-03-26 19:54:31 -07:00
d11fc90317 Export atomic iter count (#2379)
* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Add axis to top_k_op. (#2416)

* Revert update on top_k_op

* Add axis to top_k_op

Add axis to top_k_op

* [auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647)
a8e4648a7d

* [auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617)
f4acf281ef

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Initialize cpuinfo in the thread pool

Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself.

This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.

* Updated Python Op and Image Pre-Processing Pipeline tutorials && Added CIFAR-10 Part 1 tutorial (#2286)

* Updated Basics tutorial: (1) Added Python 3 support with __future__ statements; (2) Various grammatical/typo fixes and minor refactoring of Markdown

* Added Python 3 support and made minor typo fixes

* Added Python 3 support with future imports, refactored and corrected errors in Markdown, added comments

* Added Python 3 support with future imports, Added use of caffe_translator.py to translate downloaded .caffemodel file to .pb files

* Upgrades to Image Pre-Processing Pipeline tutorial

* Updated Python Op tutorial

* removed markdown with empty links

* Added Part 1 of an end-to-end CIFAR-10 tutorial

* Updated MNIST Dataset and Databases tutorial with python3 support and markdown fixes

* Tweaks to markup, less training iterations

* changed permissions of CIFAR10_Part1; typo corrections in Image_Pre-Processing_Pipeline

* Typo corrections in Multi-GPU Training tutorial

* sync Python_Op py_gen with the IPython notebook

* nit typo correction

* [auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653)
5cb999ddc1

* [auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657)
ecac1c1624

* Strip down onnx to only pb definitions in mobile build (#2426)

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count
2018-03-26 19:26:09 -07:00
b6e80a1ec4 Caffe2-onnx exporter (#2248)
* caffe2-onnx frontend

* Remove Python part of the conversion code

* nit

* convert more ops

* Address commmetns
2018-03-26 19:23:45 -07:00
a589180021 Update cpuinfo submodule (#6014) 2018-03-26 19:25:02 -04:00
ef4c09fb4a mkl-include is not installable if your conda is too old. (#6022)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-26 18:46:54 -04:00
64e94f02b7 Move Dockerfile to docker/pytorch (#6009)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-26 17:26:34 -04:00
b6b2edb96f [auto] Update onnx to 6fe932a - Replace unittest.skip with custom exception (#659)
6fe932a3e1
2018-03-26 21:20:58 +00:00
fc030bf377 Remove consumed_input (#5928) 2018-03-26 16:58:38 -04:00
1e417e23bc Strip down onnx to only pb definitions in mobile build (#2426) 2018-03-26 13:52:16 -07:00
2a47fb3082 [auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657)
ecac1c1624
2018-03-26 20:51:33 +00:00
f2bc1dc099 [auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653)
5cb999ddc1
2018-03-26 20:23:03 +00:00
7462eca363 Initialize cpuinfo in the thread pool
Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself.

This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.
2018-03-26 15:44:47 -04:00
5d628db0a2 Deprecate ctx.saved_variables via python warning. (#5923)
* Deprecate ctx.saved_variables via python warning.

Advises replacing saved_variables with saved_tensors.
Also replaces all instances of ctx.saved_variables with ctx.saved_tensors in the
codebase.

Test by running:
```
import torch
from torch.autograd import Function

class MyFunction(Function):
    @staticmethod
    def forward(ctx, tensor1, tensor2):
        ctx.save_for_backward(tensor1, tensor2)
        return tensor1 + tensor2

    @staticmethod
    def backward(ctx, grad_output):
        var1, var2 = ctx.saved_variables
        return (grad_output, grad_output)

x = torch.randn((3, 3), requires_grad=True)
y = torch.randn((3, 3), requires_grad=True)
model = MyFunction()
model.apply(x, y).sum().backward()
```
and assert the warning shows up.

* Address comments

* Add deprecation test for saved_variables
2018-03-26 14:13:45 -04:00
4dc8c2a3cf Add descriptive error message for test_cpp_extensions ModuleNotFoundError (#5978)
* Add descriptive error message for test_cpp_extensions ModuleNotFoundError error.

* Modify the error message
2018-03-26 14:11:11 -04:00
cfd94c481e Add precision matrix to MultivariateNormal (#5998)
Also changed some .contiguous().view(*) to .reshape(*).
2018-03-26 14:09:38 -04:00
39829c1670 Improve docs (#5999)
* Clarify det and svd doc on when backward is not stable

* Fix some links in nn.functional doc; improve upsampling doc
2018-03-26 14:09:11 -04:00
1ab248d09e Fixes #5973: Stop printing verbose warnings for MSVC (#6001)
* Stop printing verbose warnings

* Add missing options

* Fix for misspelling
2018-03-26 09:40:30 -04:00
JP
2df578a71a add mkl dependencies to setup (#5991) 2018-03-25 23:21:16 -04:00
c6e903f804 [auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617)
f4acf281ef
2018-03-25 23:51:09 +00:00
b2da9fd220 [distributions] Rename .params to .arg_constraints, fix logic (#5989) 2018-03-25 15:24:32 +02:00
03a6952ac9 [distributions] Fix scalar bugs in torch.distributions.transforms etc. (#5931) 2018-03-25 13:33:31 +02:00
f895698183 Implement MultivariateNormal.mean, .variance properties (#5988) 2018-03-25 13:32:06 +02:00
f6274a4ef7 Fix "command not found" error in perf test (#5982) 2018-03-24 23:46:48 -04:00
f9882473b2 add pip mkl-devel to the error message when mkl is found but mkl headers are not (#5984) 2018-03-24 18:25:41 -04:00
41c84ca735 Support batch LowerCholeskyTransform (#5980)
* batch lower cholesky transform

* add checking contiguous

* remove cache file
2018-03-24 13:43:46 -04:00
5d77709485 Linearly interpolating upsampling fix (#5927)
* Changes in bilinear upsampling

* Add align_corners option to upsampling module & functional when using linearly interpolating modes
When align_corners=True, it uses the old original upsampling scheme, which gives visually better results,
but doesn't properly align input and output pixels, and thus cause the output vary basing on input.
This PR adds this align_corners option, and changes the default behavior to align_corners=False, with
proper warning if this option is not specified upon using nn.Upsample or nn.functional.upsample to let
be aware of this new change.
Adds tests in test_nn.py for spatial invariance when align_corners=False, and usual module tests for
align_corners=False.

* remove redundant checks and unnecessary variables; fix the cast

* fix negative indices
2018-03-24 12:21:13 -04:00
2f8d6582de Store perf numbers in S3 (#5951)
* Store perf numbers in S3

Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to S3, which allows us to have finer control over when to push the new numbers.

This is in replacement of #5844 - storing numbers in RDS has its own problems with schema migration and backward compatibility, and using a NoSQL database might be an overkill at this point.

* Fixed issues
2018-03-24 12:19:56 -04:00
332d5ffd11 Modidy setup docs for Windows (#5981) 2018-03-24 12:17:01 -04:00
08891b0a4e Group Normalization (#5968)
* Group Normalization

* move to ATen
2018-03-24 12:16:18 -04:00
ed0f629fe9 [distributions] Implement Power transform (#5976) 2018-03-24 15:20:07 +01:00
15a981e75a Disable TestBottleneck test_cuda on Windows (#5977) 2018-03-24 08:00:32 -04:00
f508e7378e [auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647)
a8e4648a7d
2018-03-24 04:08:54 +00:00
a73f9af5ab Add axis to top_k_op. (#2416)
* Revert update on top_k_op

* Add axis to top_k_op

Add axis to top_k_op
2018-03-23 20:43:43 -07:00
9923701a0d Fix crash when cat-ing empty cuda tensors (#5971)
Fixes #5739. The CUDA path for `torch.cat` was missing a check for the
case where all input tensors are empty.
2018-03-23 22:22:39 -04:00
641fb21bdd Update no_unions flag for nanopb gen and update ONNX proto files (#5972) 2018-03-23 18:52:33 -04:00
7375ba5e60 [auto] Update onnx to 7c009fe - Fix lint error in optimizer test (#656)
7c009fe8df
2018-03-23 22:36:29 +00:00
f3e16cc737 Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (#5408)
This PR addresses issue #5024

* Expose Conv2dBackward in python

* Separate interface for exposing gardients of operators

* Revert old changes

* Add tests

* Add conv1d gradients. Refactor tests for grad convolutions

* Refactor names and change examples

* Remove Varibale from tests for conv backward
2018-03-23 17:49:32 -04:00
831780390c Fixed non-determinate preprocessing on DataLoader (#4640)
dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate.

DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087

To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results.

To reproduce issue you may change ind_worker_queue to False and run the script several times.
Code to reproduce issue is in the corresponding PR.

* TestIndividualWorkerQueue added to DataLoader tests

* Review fixes

* "Simplify" code by removing itertools

* Rebase conflicts fix

* Review fixes

* Fixed shutdown behavior

* Removed ind_worker_queue flag.

* Rebase on master

* Disable tests that use DataLoader with multiple workers (#5322)
2018-03-23 17:43:59 -04:00
83de3a0b0e add AVX2 implementation for sigmoid function (#5010)
PR introduces AVX2 optimization for sigmoid floats. Issue #4929. The internal benchmark shows ~10x speedup.

Added AVX2 vectorized sigmoid using the 8-way vectorized exp (exp256_ps) in avx_mathfun.h.

Implemented vector dispatch for sigmoid. Since sigmoid function is defined for floats and doubles only, for now, added preprocessor #ifdef to init sigmoid dispatch only for float and double.

Vector functions in THVector.h were not called for all of the basic functions in floating point or double only. Changed the LAB_IMPLEMENT_BASIC_FUNCTION define in THTensorMatch.c to use THVector_(NAME) implementations if the inputs are contiguous. For the functions that do not have vectorized SIMD implementations will use the same default function from THMath.h

* add AVX2 implementation for sigmoid function

* Fix bug in AVX2 code for sigmoid

* Add new macro for custom vectorized functions
2018-03-23 17:34:34 -04:00
feb2785c5c Implement torch.util.bottleneck (#5216)
* Implement torch.util.bottleneck

This is a tool that is intended to be used as initial exploratory
debugging of bottlenecks in user scripts. Run it with

    python -m torch.utils.bottleneck /path/to/source/script.py

* Refactor and address comments

* Fix tests

* Allow passing of args to the profiled script

* Replace Variable
2018-03-23 17:27:35 -04:00
3cc00e8b2f Remove pragma once from cpp file (#5965) 2018-03-23 21:26:36 +01:00
810edb615d [easy] Minor improvement of the code quality in caffe2/onnx (#2396)
* code quality

* Comments
2018-03-23 13:25:58 -07:00
7a5e2af6d5 Follow new version number in setup.py (#2266) 2018-03-23 13:22:26 -07:00
e5f4b9dc0e [auto] Update onnx to 063d12f - Fix optimizer split pass for models with constant output (#652)
063d12f6a9
2018-03-23 18:56:16 +00:00
8cf521b522 fix mvn docs (#5967) 2018-03-23 14:26:55 -04:00
4dc55a4240 Fix incorrect rendering of Tensor.index_*_ doc examples. (#5969) 2018-03-23 14:26:21 -04:00
8fbad1b28a [auto] Update onnx to a4dcc47 - Minor code quality improvements in defs/ (#613)
a4dcc47791
2018-03-23 18:23:40 +00:00
425361af6a Bump onnx opset version (#2402) 2018-03-23 10:48:12 -07:00
34e49ceb83 [auto] Update onnx to c88ab71 - Verionize model zoo with opset version (#650)
c88ab71e98
2018-03-23 17:26:53 +00:00
8c92be5320 [auto] Update onnx to 8c90dc1 - Add maxpool test cases (#573)
8c90dc1dd9
2018-03-23 17:07:11 +00:00
213fa61706 Implement range for loop in script (#5827)
* Implement range for loop in script

* Fix handling of boolean constants

* Use WithInsertPoint

* Allow dynamic max trip count

* fix symbols

* Fix argument order

* fix test

* Add insert{Input,Output} APIs and use them

* Factor out condition stuff

* clang-format

* Address remaining comments

* Fix tests

* Implement script in AST frontend
2018-03-23 11:55:32 -04:00
03495137d0 Add windows doc (#5859) 2018-03-23 11:54:44 -04:00
8e22ef0cb2 Support legacy empty tensor behavior in cat (#5889)
* Support legacy empty tensor behavior in cat

Continuing from #5837:
Fixes #5332.

Currently, the following behavior happens with torch.cat:

```
import torch

x = torch.randn(4, 3, 32, 32)
empty = torch.Tensor([])

res1 = torch.cat([x, empty], dim=1)

res2 = torch.cat([empty, x], dim=1)
```

However, at some point in the past, res1 and res2 were equal. This PR
supports the legacy behavior of ignoring empty tensors when
concatenating a list of tensors, until we have empty tensors that can
have arbitrary shape, at which point we'll stop supporting this
behavior.

* Address comments
2018-03-23 11:53:31 -04:00
c4ee2b7067 Moved torch headers copy to build_deps (#5772)
* Moved torch headers copy to build_deps

PR #5706 initially moved headers under build_ext to fix bdist_wheel and
build develop. This broke install and #5755 moved them back to install
which broke bdist_wheel and build develop. Looks like build_ext is called
from install after it already tried to copy the headers to the python install
dir and the headers were not installed correctly. Using build_deps works
correct with all setup.py install, bdist_wheel and build develop.

* Comment about the auto-generated files

Added comment that the current solution will not include auto-generated
files which may be a problem if somebody needs to use them
2018-03-23 11:34:27 -04:00
0045895837 Update speed_benchmark binary
- Support specifying type (float or uint8_t) for inputs
- Create input blobs if they don't exist
2018-03-23 11:26:23 -04:00
2030ac7545 Recommend citation (implements #4126) (#5955) 2018-03-23 09:57:29 -04:00
fe6c5ad435 [auto] Update onnx to 1e613b5 - Add DepthToSpace test cases (#619)
1e613b5d4e
2018-03-23 08:03:43 +00:00
5c87e55a4d [auto] Update onnx to 34d9ad2 - struct InferenceContext needs a virtual destructor (#648)
34d9ad20de
2018-03-23 07:48:06 +00:00
21918b94e4 Add InheritOnnxSchema property to c2 op schema (#2366)
* Add InheritOnnxSchema property to c2 op schema

* Add onnx inherit for {Conv,Maxpool,AveragePool}{1D,2D,3D}
2018-03-22 22:50:27 -07:00
bbb7c722df Remove legacy onnx optimizer tests (#2394) 2018-03-22 21:08:05 -07:00
b4d33cefc1 Fix compiling issue with CAFFE2_NO_SANITIZE (#2386) 2018-03-22 20:48:43 -07:00
1288c4fd79 refactor epoch_limiter (#2389)
* refactor epoch_limiter

* fix test
2018-03-22 20:32:13 -07:00
f3b7b2f293 Remove ONNX consumed_inputs (#2278)
* Remove ONNX consumed_inputs

* Bump up opset version to 6 issued by onnx caffe2 frontend
2018-03-22 20:24:35 -07:00
81a29967c5 [auto] Update onnx to 5f69c37 - Remove the only use of EnforceConsumed (#640)
5f69c37628
2018-03-23 03:23:28 +00:00
e1948d7377 [auto] Update onnx to 85133e9 - Introduce shape inference (#564)
85133e9849
2018-03-23 01:12:18 +00:00
1e1be56591 [auto] Update onnx to 0f49cb6 - Set 2GB protobuf parse limit (#646)
0f49cb696c
2018-03-23 00:14:38 +00:00
e3e0c34390 Unify error checking for tesnor.index_copy_ (#5642) 2018-03-22 20:07:15 -04:00
e35212ebd0 Handle the ONNX opset for BatchNormalization (#2382)
* Handle the ONNX opset for BatchNormalization

* address comments
2018-03-22 15:09:54 -07:00
566a25e1e4 Add keyword argument to PipeReaderBuilder (#2381)
att
2018-03-22 14:17:47 -07:00
c803ed524e fix windows build 2018-03-22 13:55:53 -07:00
d946267b80 Specify outputs number in embedding_bag onnx export (#5935) 2018-03-22 16:55:23 -04:00
2ad972c9eb A complete revamp of our test scripts. (#5904)
- All of the scripts are based off of the idea that they should be as
  simple as possible, and all the heavy lifting done in the construction
  of the Docker file.  The scripts are really simple now.  A bigger
  philosophical discussion can be found in .jenkins/README.md

- build-asan.sh is split out of build.sh, as ASAN builds are a bit
  specialized and it's inappropriate to run many of the other builds
  as part of them.

- We now build and run with mkl/mkl-include on the CPU only builds

- We now report sccache and ccache stats at the end of all builds.

- run_test.py flushes stdout/stderr before making a subprocess call,
  which should solve our interleaving problems.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-22 16:31:50 -04:00
c9c978dff0 Fix tensor.permute(dims) backward for negative dims (#5945)
Fixes #5943

For the following code:
```
import torch

u = torch.zeros((3, 3), requires_grad=True)
v = u.permute(-1, -2)  # (1, 0) here is fine
v.sum().backward()
```

during the backward pass, a std::vector is constructed
as an "inverse" of the permutation. To do this, all the dims
are indexed into the vector.

The problem with that is that the negative dims were being indexed
into the std::vector, causing undefined behavior. This PR wraps
those negative dims so they're handled correctly.
2018-03-22 16:30:55 -04:00
977bae0a71 move nomnigraph to OSS
@already-on-github

moving nomnigraph to caffe2/core for open source
2018-03-22 12:42:32 -07:00
504320d85b Update README.md 2018-03-22 12:30:08 -07:00
45da53f478 Remove Python onnx-caffe2 conversion code (#2362)
* WIP

* Remove Python onnx-caffe2 onversion code

* Fix build

* Comments

* Add comments

* Fix typo in comments
2018-03-22 11:59:03 -07:00
a58f2d242a Test both Python and string JIT frontends (#5891) 2018-03-22 16:58:36 +01:00
befd9642bf py3 - use loop instead of map for test_torch:test_cpu_parallel (#5940) 2018-03-22 11:28:29 -04:00
add04c56bf Verify that 'catch' submodule has been checked out before attempting build. (#5941) 2018-03-22 11:28:04 -04:00
2a02ec6537 Fix index out of range error when view a scalar as 1-dim tensor (#5934) 2018-03-22 09:39:59 -04:00
37a84dd40d Move definitions of Kind out of NO_PYTHON block (#5914) 2018-03-22 09:36:08 -04:00
3053618624 Add argmax and argmin ops (#2371)
* Revert update on top_k_op

* Add axis to top_k_op

* Remove do { ... } while (false)

* Revert top_k op to upstream

* Add argmin and argmax ops

Add argmin and argmax ops

* Revert top_k_test to upstream

* Add argmin and argmax ops

Add argmin and argmax ops
2018-03-22 00:52:11 -07:00
e9f144b3e8 parallel_for_2d fix and guarding avx/avx2 compilation (#5926)
Fix for #5921.

I'm adding support compilers that don't support -mavx -mavx2 by revisiting the dispatch code.
2018-03-22 01:14:56 -04:00
418aad2c54 Add support for subscripts in Python frontend (#5890) 2018-03-22 01:11:25 -04:00
48c70d2dbd Fix ReduceMean performance by specializing Eigen implementation for common shapes (#2355) 2018-03-21 21:48:54 -07:00
c8d1ec02be [jit] Have ScriptModule inherit from Module (#5769)
* Have ScriptModule inherit from Module
  This is accomplished by created replacement _parameters, _buffers,
  and _modules which implement the OrderedDict APIs but which
  actually get/set their members inside script::Module
* Merge TracedModule with ScriptModule
* Move logic of attribute handling into Python bindings rather than
  make script::Module handle it. This was redundant with nn.Module,
  which already handles attribute.
* Make TracedModule a subclass of ScriptModule
* Move handling of attribute kind logic into bindings.
* Allow ScriptModule to contain non-script module submodules.
2018-03-22 00:17:49 -04:00
b2c56eb219 Removed special handling for onnx sqrt (#2353) 2018-03-21 21:05:25 -07:00
1c0862c301 Fix a typo (#2339) 2018-03-21 17:24:39 -07:00
2d03ae2f85 Move ParseProtobufFromLargeString to proto_utils (#2354)
* Move ParseProtobufFromLargeString to proto_utils

* ParseProtobuf -> ParseProto to be consistent in naming
2018-03-21 17:05:14 -07:00
7cbbc0bc74 Implementation of the logistic-normal distribution (#5547) 2018-03-22 00:32:14 +01:00
0ea8964fd6 Revert "Export number of iterations of AtomicIterOp" (#2359)
* Revert "Use -DCMAKE_BUILD_TYPE=Release for local build by default"

This reverts commit 035c62081f6420405b9f1380cc5d21b4c6ae78f6.

* Revert "Export number of iterations of AtomicIterOp (#2338)"

This reverts commit 91b7a0cb48c6b079e2ca8fd5c26819a003937d76.
2018-03-21 16:11:29 -07:00
3aa393f7e2 Log NNPACK profile to std::cout instead of LOG(INFO)
Similar to #2333, but for NNPACK bindings
2018-03-21 18:46:11 -04:00
4b54f04eab [auto] Update onnx to caf9256 - Do not allow multiple spaces after comma (#638)
caf9256a9d
2018-03-21 22:13:30 +00:00
d707dae013 Add half test in test_nn for auto generated tests. (#5362)
* add half and double test in NewTestModule

* add half/double/float tests in NewCriterionTest

* resolve merge conflict with master
2018-03-21 16:55:06 -04:00
44039ffcea Use -DCMAKE_BUILD_TYPE=Release for local build by default 2018-03-21 16:12:32 -04:00
e4eee7c2cf Implement MarginRankingLoss as native function and add reduce=True arg to it (#5346)
* add reduce=True arg to MarginRankingLoss

* make default margin arg match for legacy

* remove accidentally added test

* fix test

* fix native_functions.yaml alphabetical order
2018-03-21 15:40:58 -04:00
8346088094 Export number of iterations of AtomicIterOp (#2338)
* Exported AtomicIterOp count

* Exported AtomicIterOp count
2018-03-21 12:39:30 -07:00
611a89c4b6 Remove more protobuf APIs. (#2348)
* Wrap ShutdownProtobufLibrary

* Remove text_format.h header and only put the function in proto_utils.h

* ParseFromString returns bool
2018-03-21 10:29:45 -07:00
b1684e9a3a Skip DepthToSpace and MaxPool same mode onnx backend tests (#2343) 2018-03-21 09:24:06 -07:00
a3bd7b2875 Optimize unique sorting by using std::vector+sort instead of std::set (#5913) 2018-03-21 08:51:20 +01:00
ece288392a [auto] Update onnx to 1a067ba - fix all python lint errors and enforce it in CI (#635)
1a067bac03
2018-03-21 07:03:57 +00:00
75a65ffe0f Set proper optimization options (#2344)
- Use -O2 in release build of Caffe2 (Android defaults to -Os on ARMv7)
- Update NNPACK submodule to use the proper options for ukernels and layers
2018-03-20 23:41:58 -07:00
ccd8c2a6bc [auto] Update onnx to d4a378c - Add ONNX_USE_LITE_PROTO (#634)
d4a378c02e
2018-03-21 06:11:10 +00:00
537e0e0330 better err msg for missing mkl headers (#5894)
Fixes #5887 .

Now it shows:

-- MKL library found
-- Found a library with BLAS API (mkl).
CMake Error at CMakeLists.txt:389 (MESSAGE):
  MKL header files not found.  If using conda, please run `conda install
  mkl-include`.  Otherwise, please make sure that CMake will search the
  directory containing the header files, e.g., by setting CMAKE_INCLUDE_PATH.


-- Configuring incomplete, errors occurred!
See also "/home/ssnl/sftp/pytorch/torch/lib/build/aten/CMakeFiles/CMakeOutput.log".
See also "/home/ssnl/sftp/pytorch/torch/lib/build/aten/CMakeFiles/CMakeError.log".
2018-03-20 22:25:28 -04:00
08b1324ec2 Fix integer overflow in remainder operator (#5906)
* Fix integer overflow in remainder

* Fix remainder operator in CUDA

* Add tests for remainder integer overflow

* Add has_different_sign static function
2018-03-20 22:05:34 -04:00
def37111eb Update locally_connected_op to reduce transpose dimensions. (#2340)
* Revert update on top_k_op

* Update locally_connected_op to reduce tranpose dims
2018-03-20 17:19:38 -07:00
6cae6d3841 Update ONNXOpCoverage.md 2018-03-20 15:22:43 -07:00
06e86a6455 Add submodules in the ATen subtree (#5911) 2018-03-20 22:00:56 +01:00
e43d0ac92a [distributions] Support pickling of constraint objects (#5910) 2018-03-20 22:00:37 +01:00
1c80ee1c74 Update ONNXOpCoverage.md 2018-03-20 13:56:13 -07:00
ac1b7b6366 Update ONNXOpCoverage.md 2018-03-20 13:55:33 -07:00
42d3bcc189 Only run WeightedMultiSample test on CPU and not GPU. 2018-03-20 13:34:22 -07:00
6aa087d902 Revert "export num iterations of AtomicIter"
This reverts commit be9c8e5591f5d38131b9bdc2249542f27dadc221.
2018-03-20 13:34:22 -07:00
22d0828f00 [easy] improve error messages
as desc.

#accept2ship
2018-03-20 13:34:22 -07:00
69706b2ab4 Add C2 for weighted sampling
C2 operator, with input (1) index; (2) cdf; argument number_samples,
output number_samples samples from the index.
2018-03-20 13:34:22 -07:00
4bb73b8361 [GanH] Weighting Layers: Adaptive/Constant/Homotopy
use case: to weight multiple losses (real values) as a single composite loss for
optimization
2018-03-20 13:34:22 -07:00
a5279dccd4 [GanH]: homotopy JSD
as titled
2018-03-20 13:34:22 -07:00
fac306d3c9 export num iterations of AtomicIter
as title.  Useful for tracking number of EASGD updates.
2018-03-20 13:34:22 -07:00
f7f48989ba GPU support for ChannelBackpropStatsOp
Step 2 of 3 in adding support for multidevice batch normalization on GPUs. Implements ChannelBackpropStatsOp. Similar to D6953411.
2018-03-20 13:34:22 -07:00
3940e7f0a7 Support computing averaged norm in blob magnitdue visualization
1. support the LpNorm operator to calculate the average LpNorm by adding one more boolean argument, i.e., LpNorm(average = true) = LpNorm(x) / size of (x)

2. integrate the average option into visualization framework
2018-03-20 13:34:22 -07:00
c43896732e Added device inference functions for Concat and Split Ops.
Changes:
=======
1. Added device inference functions for Concat and Split Ops.
2. Added a unit test to validate the change. See, test_device_inference_function in core_test.py
3. Fixed some formatting.
2018-03-20 13:34:22 -07:00
e0e334793c Revert D7219461: Mark full sync data parallel ops with rules
This reverts commit 79c56ec5859e25c7caec7bb6b79e80dd19307c64
2018-03-20 13:34:22 -07:00
9edbafe0de Mark full sync data parallel ops with rules
Instead of using hard-coded rules or rely on gpu_strategy to mark full sync data parallel ops, we need some generic rules that is applicable to both the single and distributed setting.
2018-03-20 13:34:22 -07:00
7bef225e72 [Caffe2] Fix double map lookup in operator_schema.h
[Caffe2] Fix double map lookup in `operator_schema.h`.
2018-03-20 13:34:22 -07:00
35b6b0747a Fix stop_if()
Making sure that stop blob is never overrided.
2018-03-20 13:34:22 -07:00
0cde2f1cc7 Output blob allocation in Caffe2.
Add support to accept manually allocated objects in output blobs and avoid calling the empty constructor of the object.
2018-03-20 13:34:22 -07:00
40683cdf42 Allow calculating average margin rank loss
Similar to LrLoss, we allow for average loss of margin rank loss.
2018-03-20 13:34:22 -07:00
d4996e50de Minor (but important) documentation update for SplitOp
This was just a typo, but an important one. Confused me for a while.
2018-03-20 13:34:22 -07:00
72f2cd8bcc Making preproc_output_schema explicit
Make it easier to plug in intermediate steps between preprocessing & trainer by maintaining a stable schema.

I also fixed enqueue() so that we can pass in the same blob in multiple location without causing data corruption.
2018-03-20 13:34:22 -07:00
7aeda25cfb Add type / shape inference for IndexHash op
just as title says
2018-03-20 13:34:22 -07:00
6af3429f4f Add 2D Row-wise Arg Max Operator
Add operator to return row-wise arg max of 2D matrix.
2018-03-20 13:34:22 -07:00
9be2de507b Cleaning up ReaderBuilder interface
The way `splits()` is currently used is so convoluted. It's impossible to compose ReaderBuilder. I'm working on a composite reader so this is a prerequisite for it.

The idea is that the ReaderBuilder should maintain the states it needs to create a reader. Any setup is done through the new `setup()` method. Currently, `setup()` should only be called once, but, if needed, it should be safe to call it multiple times.
2018-03-20 13:34:22 -07:00
10e8d7100d Fix caffe2_benchmark 2018-03-20 13:34:22 -07:00
ab3065de25 Playground refactoring and DataPreproc reader for DAIPlayground at facebook
add one more input module preproc everstore for IN1k.  It uses the same datasets of sherlock everstroe input reader, then it us DAtaPreproc operator to distribute the image preprocessing on other machine other than the trainer.  Suppose to release some compute burdent from trainers.

@override-unit-failures
(Note: this ignores all push blocking failures!)
2018-03-20 13:34:22 -07:00
a4d0ef2621 Fix stop blob of processing reader
See inline comment
2018-03-20 13:34:22 -07:00
f62d6f0578 Limit the number of LOG(INFO) for unavailable engine to 64 (#2332)
Inlining what glog's LOG_FIRST_N does because not every platform has glog.
2018-03-20 12:48:10 -07:00
9123fcc857 Use std::cout instead of LOG(INFO) in TEST_Benchmark implementation
LOG(INFO) can be stripped out at compile-time or disabled at run-time,
but there're hardly use-cases where we want to call TEST_Benchmark,
but don't want to see the result. Additionally, on Android, LOG(INFO)
writes to logcat, which is OK for errors/warnings, but inconvenient
for benchmarking results, as on new phones logcat spawns logs like crazy.
2018-03-20 15:31:03 -04:00
d211904be3 [auto] Update onnx to cf76f2f - Add averagepool test case (#572)
cf76f2f3cf
2018-03-20 17:29:13 +00:00
84a73775d5 adding fp16 tests in test_nn (#5020) 2018-03-20 18:01:05 +01:00
2f27c1b56b Revert "Fix ImportError with requests in model_zoo (#5896)" (#5909)
This reverts commit 21ce93e88ff8eed1c9fa230a8c5d97d188093705.
2018-03-20 17:35:03 +01:00
efe1c2bd13 hypen as a valid part of model names (#2312) 2018-03-20 08:52:54 -07:00
21ce93e88f Fix ImportError with requests in model_zoo (#5896)
Not sure if this is a backwards compatibility issue. 
```
Python 2.7.9 (default, Apr  2 2015, 15:35:35) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests.get as urlopen
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named get
>>> from requests import get as urlopen
>>> 
```
2018-03-20 09:51:34 +01:00
a749bde0b1 Make InputSize and OutputSize const member function (#2313)
Make InputSize and OutputSize const member function
2018-03-20 01:34:12 -07:00
cda2f02f89 Skip the test average pool same mode tests (#2324) 2018-03-20 00:13:31 -07:00
2d8a674141 [auto] Update onnx to 4560267 - Use static_cast to replace dynamic_cast to avoid needing RTTI (#629)
4560267df4
2018-03-20 05:59:14 +00:00
b0fe67aca8 Expose more APIs for onnx cpp backend (#2317) 2018-03-19 22:46:26 -07:00
2a4b33bf87 Add doc for torch/onnx/operators.py (#5895)
* Add doc for torch/onnx/operators.py

* lint
2018-03-19 23:48:25 -04:00
b43d6162fb Remove USE_THREADS since it is needed explicitly. (#2322) 2018-03-19 20:46:43 -07:00
bbafae143b Caffe2 transpose (#2320)
* Revert update on top_k_op

* Speed transpose up on gpu

Speed transpose up on gpu
2018-03-19 20:30:33 -07:00
ebd4dadeb0 [auto] Update onnx to 5865ed1 - Minor code quality improvements (#614)
5865ed15f4
2018-03-20 00:42:05 +00:00
aa4af1a5f9 [tiny] make debug info optional, CAFFE2_DEBUG env variable driven 2018-03-19 16:58:04 -07:00
23631eee5a [C2] Fix the check of current scope in optimizer (#2316)
scope.CurrentDeviceScope() can return a None type, which was not considered.
2018-03-19 16:38:55 -07:00
39a6859685 Fix softmax symbolic (#5893)
ba64724aee
2018-03-19 19:33:02 -04:00
fb77b423f4 refactor histogram as net modifier (#2314) 2018-03-19 16:04:58 -07:00
1936753708 Added an implementation of a multivariate normal distribution (#4950) 2018-03-19 23:22:46 +01:00
7e13138eb6 Revert "Enable resetting of batchnorm running stats and cumulative ("simple") moving average" (#5892)
* Revert "Port ATen and JIT C++ tests to Catch2 (#5788)"

This reverts commit 6f80023c29e0fb55f46a32c4931bc5d4ba749846.

* Revert "Fix error message for cat-ing zero-dim tensors (#5819)"

This reverts commit cf2e1760490d369e93017b9425279b235c10772d.

* Revert "Softmax symbolic should account for negative dim (#5846)"

This reverts commit ba64724aeea8ad5d4b50cd1154fca5a011618333.

* Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)"

This reverts commit 22ef8e5654c45d1f5404e3add6ad19678c0b80a9.

* Revert "Don't modify requires_grad when running DataParallel in no_grad mode (#5880)"

This reverts commit d11b7fbd1c49ed7bd84c89d286e2763e6ba55f51.

* Revert "fix some methods not showing up in doc (#5882)"

This reverts commit 24fca0efb289a069929639783d1c050b79e591c0.

* Revert "ReduceOps cleanup and set_num_threads (#5723)"

This reverts commit 84400d5531500e1a3fbcfe8a3f2865f982405861.

* Revert "introduce shape_as_tensor and reshape_from_variable_shape (#5824)"

This reverts commit f446b82e70ca0aa42fffa58469c28b6bce51d021.

* Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766)"

This reverts commit 99b1f6cfad85a4856550cc1e787afd7ff9e6c6aa.
2018-03-19 17:47:54 -04:00
e426a5dadd Add an option for Caffe2 to link with local protobuf. (#2306) 2018-03-19 14:36:53 -07:00
56505007a2 [auto] Update onnx to c39280b - Add the wheel setup test in Windows build and support py35 in CI test (#620)
c39280b566
2018-03-19 21:21:14 +00:00
00603b5e0a Add CollectAndDistributeFpnRpnProposalsOp for FPN support (#2254)
* Add CollectAndDistributeFpnRpnProposalsOp for FPN support

* Adds a C++ operator equivalent to the Python op in Detectron
* Once some additional GenerateProposalsOp changes are made this will
 let us support Detectron FPN models with straight Caffe2 C++ ops
* RetinaNet and segmentation models require additional work

* Remove some uses of conservativeResize

* Add notes about training and inputs/outputs to operator documentation
2018-03-19 14:04:43 -07:00
6f80023c29 Port ATen and JIT C++ tests to Catch2 (#5788)
This PR addresses #5648. In particular, following the discussion at #5648:

- it adds Catch as a submodule (https://github.com/catchorg/Catch2) in torch/aten/utils
- it ports all ATen tests to Catch
- it ports torch/csrc/jit/test_jit.cpp to Catch (libtorch only, Python build is unaffected)
2018-03-19 16:09:43 -04:00
cf2e176049 Fix error message for cat-ing zero-dim tensors (#5819)
Fixes #5552

* Fix error message for cat-ing zero-dim tensors

* Address comments
2018-03-19 16:06:27 -04:00
ba64724aee Softmax symbolic should account for negative dim (#5846) 2018-03-19 15:43:41 -04:00
22ef8e5654 [fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)
This is the first of three PRs that #5537 will be split into.

This PR adds mkl headers to included files, and provides helper functions for MKL fft and cuFFT.
In particular, on POSIX, headers are using mkl-include from conda, and on Windows, it is from a new file @yf225 and I made and uploaded to s3.

* add mkl-include to required packages

* include MKL headers; add AT_MKL_ENABLED flag; add a method to query MKL availability

* Add MKL and CUFFT helpers
2018-03-19 15:43:14 -04:00
d11b7fbd1c Don't modify requires_grad when running DataParallel in no_grad mode (#5880)
Previously, running DataParallel in no_grad mode would change the
requires_grad property of the network's parameters to False. The issue
is that Broadcast returns aliases of the inputs for the source device.
In no_grad mode, it would deatch these inputs in-place.

Fixes #5851
2018-03-19 15:26:51 -04:00
24fca0efb2 fix some methods not showing up in doc (#5882) 2018-03-19 14:48:15 -04:00
84400d5531 ReduceOps cleanup and set_num_threads (#5723) 2018-03-19 13:40:56 -04:00
f446b82e70 introduce shape_as_tensor and reshape_from_variable_shape (#5824) 2018-03-19 13:30:27 -04:00
3c213bd9da Add fallback for CuDNN pooling (#2291) 2018-03-19 09:54:30 -07:00
3f667176cc Fixing the conda-gcc-cuda builds (#2305)
* Fixing mistakes in earlier PR

* Allowing cuda builds of different gccs
2018-03-19 09:32:32 -07:00
99b1f6cfad Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766) 2018-03-19 11:47:57 -04:00
5014adfe2f Fix CUDA 8 build on Windows (#5869) 2018-03-19 09:01:22 -04:00
334fc98fb0 Handle the legacy padding in global pooling case (#2292) 2018-03-18 21:28:15 -07:00
58af449ca1 Bump onnx opset version to lastest (#5849)
This is mainly to include the new version of Reshape operator.
2018-03-18 23:16:08 -04:00
0eaf883d6a Delete stubs from one more place. (#5866)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-18 22:56:22 -04:00
e431c98205 Caffe2: Add support for several auto-created observers and move net summary to (#2304)
a separate observer

This allows to support several auto-attached observers.
2018-03-18 18:23:40 -07:00
b5def81de8 Delete stubs from LD_LIBRARY_PATH when we actually run code. (#5861)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-18 20:58:03 -04:00
dad57a414b put caffe2_protos to a standalone target (#2302) 2018-03-18 17:38:23 -07:00
77042266ee Multi-gpu test. (#5854)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-18 00:45:28 -04:00
c18dba9fe7 Adding gcc4 conda builds (#2283)
* Changes without centos changes

* Changes for protobuf 3.5 and gcc 4.8

* Changing 3.4.1 back to 3.5.1

* Preventing installing two versions of setuptools

* Fixing setuptools bug
2018-03-17 17:26:37 -07:00
2f64e1cdf6 Add second iteration in test_DistributedDataParallel (#5830) 2018-03-18 00:27:45 +01:00
0ca046c68d Fix bug (#5836) 2018-03-17 17:05:33 +01:00
1dcad08537 Support N-D tensors in Bilinear (#5764)
* support n-d inputs in bilinear and move to aten

* support n-d inputs in bilinear and move to aten

* add asserts to bilinear inputs

* address comments

* cast int64_t in asserts
2018-03-17 11:57:43 -04:00
04edb8948a Fix kldiv backward on CUDA (#5814)
* Test that gradOutput is being used for criterion losses

* Fix incorrect kldiv backward on CUDA

* Address comments

* Fix legacy
2018-03-17 11:17:07 -04:00
e876b5d9d0 implement TripletMarginLoss as a native function (#5680)
* implement TripletMarginLoss as a native function

* implement TripletMarginLoss as native function

* fix compile error

* address comments

* address comments

* Add keepdim arg to pairwise distance
2018-03-17 11:10:48 -04:00
32462e0ac4 Cleaner solution to the undefined references in RPC (#5817) 2018-03-17 11:10:24 -04:00
40ea24cc54 Skip test_backwards_fork test as flaky. (#5839)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-17 10:40:27 -04:00
d776c52ff7 Fix nvprof parsing (#5840) 2018-03-17 10:38:57 -04:00
ce0204402b Add ATen symbolic for unique op (#5845) 2018-03-17 10:28:27 -04:00
f390a252f4 fused GLU backward (#5782) 2018-03-17 10:27:05 -04:00
7cbe63da86 improve handling of precision issue in torch.multinomial (solves #4858) (#5774)
* improve handling of precision issue in torch.multinomial (solves #4858)

* add test

* review feedback - eliminate size check. Thanks!
2018-03-17 10:26:22 -04:00
00cc962670 typo (#5847) 2018-03-17 10:26:00 -04:00
abf97e954e Avoid in-place ops in BoltzmannTransform (#5842) 2018-03-17 10:24:23 -04:00
d441396e47 Fix crash in new tensor with numpy array in CUDA (#5850) 2018-03-17 10:23:02 -04:00
e6ac93b817 Add support for number and list literals in Python frontend (#5843) 2018-03-17 10:22:23 -04:00
0167f76d2a [auto] Update onnx to 012145f - Relax the precision on the output (#622)
012145fda9
2018-03-17 03:13:07 +00:00
def76eee1c [auto] Update onnx to e2e8003 - add output shape as input for reshape (#608)
e2e8003ec3
2018-03-16 23:32:44 +00:00
c155842cc1 Update onnx frontend to emit new onnx Reshape (with shape as input) (#2287)
* Update onnx frontend to emit new onnx Reshape (with shape as input)

* Address comments and revert submodule change
2018-03-16 16:32:35 -07:00
875925b030 Add operator[](int64_t) overload (#5838) 2018-03-16 23:10:37 +01:00
c474136ee1 [REDO] Add torch.sparse_coo_tensor factory. (#5781)
* Add torch.sparse_coo_tensor factory.

Notes:
1) I didn't add Tensor.new_sparse_coo_tensor; it didn't seem particularly useful, but it's easy to add
2) This doesn't do the type inference, i.e. torch.sparse_coo_tensor(indices=LongTensor, values=IntTensor)
will return a sparse tensor corresponding to the default type rather than a sparse IntTensor.  We can add
type inference later when we add it to other factories.

* Fix merge.

* Use type_conversion function from python_variable_methods.
2018-03-16 13:58:02 -04:00
acc409396b Namespaced symbols (#5820)
* Namespaced symbols

- Our interned strings now have structure, "ns::symname" rather than just
  "symname" before.  We support efficient namespace testing for uniques
  by encoding the namespace in one byte in the Symbol internal representation.
  See torch/csrc/jit/interned_strings.h for a more in-depth implementation
  discussion.

- All uses of ksymbol are now attr::symbol (or some appropriate namespace).
  The valid namespaces are prim, attr, onnx and aten.

- Symbol is bound in Python as a qualified string "attr::symbol", EXCEPT for the
  attribute setting/getting API, whose symbols must always be attr
  symbols; they get special cased to assume strings are passed.
  There's a little bit of naughtiness in the implementation, maybe you know
  how to solve it.

- However, the g.op() convenience function assumes that you're generating
  ONNX operators, unless you explicitly qualify.

- All ATen operators and nodes have built-in interned strings generated
  for them, so you should never have to write a string literal ever again.
  The tracing code is adjusted to use it.

- ONNX exporter now properly tests to see that all operators are in
  onnx namespace before accepting the export.  This is way more
  robust than the previous exporter, which would be willing to
  export capitalized operators which were not actually ONNX operators.

- A slight organizational change for symbolic.py; this module now ONLY
  contains aten operators.  In particular, the exporter for Constant
  has moved into utils.py (along with Undefined, from the C++ side),
  since primitive ops get "special treatment."

- The un-inplacing logic in recording is more robust, so that we don't
  delete a trailing underscore from __and__.  This never affected us
  before because we didn't have any tests for it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-16 13:36:11 -04:00
940a0ab67b Add logdet and slogdet (#5393)
* 1. Add logdet and slogdet in ATen side
2. Previously, det can return result with incorrect sign upon seeing symmetric
   matrices. This is caused by the wrong assumption I had on SVD (when input is
   symmetric U=V^T). This fixes it.
3. Moreover, after fixing 2 now QR is always needed for det forward. So I moved
   SVD to backward call. Since this is a specific variant of SVD, it is named as
   _svd_with_positive_UV_det, with derivative.yaml entry being svd_backward.
4. Updated/added backward functions for det, logdet and slogdet, which uses
   _svd_with_positive_UV_det and svd_backward inside.
5. Optimized svd_backward:
   a. Avoid unnecessary kernels when only sigma has gradient (this is the usual
      case, and also true with *det backward functions).
   b. Fix SVD double backward by avoiding a nan.

* 1. Add/update grad checks for det, logdet, and slogdet.
2. Fix an incorrect check for dim_args_idx in test_autograd.py
3. Add option to only test a subset of output values, specified by
   test_output_indices, for cases like slogdet where only the
   second output is differentiable.
4. Add better doc for the test generating list.

* Add/improve output tests for det, logdet and slogdet
Add a scaling to random matrices so closeness checks are more robust

* Remove unnecessaery Variable wrappers in some test files

* Add logdet slogdet docs

* Improve an err msg in THTensorLapack.c

* add inverse-based backward for invertible matrices
use svd only for non-invertible case, so don't need the special variant anymore

* use LU rather than QR
2018-03-16 09:23:00 -04:00
f5aa8d55ad fix detach in place error in DDP (#5829)
* fix detach in DDP

* fix typo

* make lint happy
2018-03-16 09:22:04 -04:00
a5a99bd4a1 Make static state function-local (#5822) 2018-03-16 09:15:10 -04:00
0b5b28f6a7 add some onnx exported supports (#5734)
* add some onnx export supports

* fix the number of spaces

* fix blank line and white spaces

* rm initialize

* split upsample, gt off, add lt
2018-03-16 00:06:35 -04:00
2322ab11b9 Allow larger margin for perf test runtime variation (#5799) 2018-03-16 00:05:41 -04:00
7f864bbe52 Fixed distribution constraints and added some test cases for distributions parameter check (#5358) 2018-03-15 23:11:20 +01:00
e8f14f5d37 Fix ONNX backend for MatMul (#2273)
* Fix ONNX backend for MatMul

* Update Python implementation

* Address comments
2018-03-15 14:43:52 -07:00
eeb90d9c95 Add a Number node to the JIT AST and unify script syntax with Python (#5716) 2018-03-15 20:56:23 +01:00
c40b99f9ae speed up CPU EmbeddingBag (indexSelectAdd op) (#5433)
* speed up CPU EmbeddingBag (indexSelectAdd op)

* keep operator inside EmbeddingBag + speedup

* comment

* update checkScalarTypes signature

* enforce type in embedding_bag_backward_cpu
2018-03-15 15:46:53 -04:00
ecffe53ef0 Fix convolution type mismatch error message (#5815) 2018-03-15 15:44:06 -04:00
404b8e9442 Revert "introduce size_as_tensor and resize_from_tensor" (#5818)
* Revert "introduce size_as_tensor and resize_from_tensor (#5792)"

This reverts commit 4fa08535ed8c63f05c7e33ca6faa255c0bb5e93b.
2018-03-15 15:05:51 -04:00
eee4f1ee42 Add symbolic functions for cumsum and embedding_bag (#5786)
* Add symbolic functions for unsqueeze, cumsum and embedding_bag

* unsqueeze already exists
2018-03-15 14:50:41 -04:00
4fa08535ed introduce size_as_tensor and resize_from_tensor (#5792)
these two operators use a Tensor to hold the sizes, which allows
symbolic implementations to be attached
2018-03-15 14:47:35 -04:00
b239b123e4 Clean up TraceInput (#5743) 2018-03-15 19:38:33 +01:00
3084a577eb Allow indexing by scalars and zero-dim tensors (#5749) 2018-03-15 17:49:16 +01:00
bc0bd063ca Fix license header for GenerateProposalsOp. (#2202)
* Seems like this didn't get adjusted when the ops were open sourced.
2018-03-15 09:25:04 -07:00
5c51bb6c0f bugfix in onnx export of batch_first = True (#5753) 2018-03-15 12:23:21 -04:00
5fa3aac610 ATen ReduceOps (#5776)
#5481 was reverted due to a strange test bug. This PR attempts to fix that.

This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities.

The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc.

For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC.

There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc.

I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch.

Here is the command for 1 core
`OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200`

Here is the command for all cores
`python sum_bench.py --enable_numpy 200`

Here are the results of each:

[Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ)

[This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w)

[Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw)

[This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA)

To test the command is
`python sum_bench.py --test 200`

[This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw)

For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. 

In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-15 12:09:28 -04:00
42ba8c1a73 Add section on unit testing to CONTRIBUTING (#5813) 2018-03-15 12:06:20 -04:00
abd6f82709 Fix debug build failure on Windows (#5771) 2018-03-15 11:42:44 -04:00
6f5e869259 Add promoteTypes to ATen and torch._promote_types to python. (#5795)
This isn't hooked up to anything yet, but is necessary for both scalar binary ops in ATen and tensor constructor type inference in PyTorch.
2018-03-15 11:02:28 -04:00
82777815f8 Fix bmm memory leak (#5744)
Fixes #5611.

THCTensor_(baddbmm) assumes that newContiguous will always return a new tensor (this is a bad assumption). At the end of the function, tensors are freed if tensor_new != tensor_old. As a result, some tensors aren't freed if they were initially contiguous and newContiguous is called on them.

Test Plan
code reading
run the following (from the #5611 bug report) and assert that the memory doesn't leak anymore
import subprocess
import torch
from torch.autograd import Variable

# This is from https://discuss.pytorch.org/t/access-gpu-memory-usage-in-pytorch/3192/4
def get_gpu_memory_map():
    """Get the current gpu usage.

    Returns
    -------
    usage: dict
        Keys are device ids as integers.
        Values are memory usage as integers in MB.
    """
    result = subprocess.check_output(
        [
            'nvidia-smi', '--query-gpu=memory.used',
            '--format=csv,nounits,noheader'
        ], encoding='utf-8')
    # Convert lines into a dictionary
    gpu_memory = [int(x) for x in result.strip().split('\n')]
    gpu_memory_map = dict(zip(range(len(gpu_memory)), gpu_memory))
    return gpu_memory_map

l, m, n = 1, 9, 1
w = torch.nn.Parameter(torch.Tensor(1024, 2, l, m).cuda())
for i in range(10000):
    a = Variable(torch.Tensor(1024, 2, m, n).cuda())
    torch.matmul(w, a).permute(0, 3, 1, 2).mean().backward()
    if i % 100 == 0:
        gpu_mem = get_gpu_memory_map()
        print("GPU: {:.2f} KB".format(gpu_mem[0]))
2018-03-15 10:44:35 -04:00
CNC
b499332aaf fixed a message typo in ATen CMakeLists.txt (#5802) 2018-03-15 10:37:27 -04:00
7a5fc2fa22 Fix undefined '__func__' for CUDA 8 on Windows (#5803) 2018-03-15 10:33:31 -04:00
a24d4b7454 Fix compilation with CUDA < 8.0 (#5621)
* Compile with CUDA 7.5 and GCC > 4.9

* Removed static keyword from device constants.
2018-03-15 10:17:12 -04:00
f5f6258288 Enable additional tensor types in Gloo backend (#5483) 2018-03-15 14:53:24 +01:00
c66111e79b Desugar torch.* and F.* functions in JIT script (#5784) 2018-03-15 12:02:31 +01:00
694bee1f7e Fix the rule for Assign in JIT's Python frontend (#5793) 2018-03-15 09:14:03 +01:00
4613eef69e Simplify run_test.py and dont use shell=True (#5767)
* Simplify run_test.py and dont use shell=True

* Fix non-shell output for check_output and always print to stderr

* Use shlex.split instead of str.split

* s/log/print_to_stderr

* with_init -> with_init_file

* Remove bufsize argument
2018-03-15 01:12:51 -04:00
eea680a354 [auto] Update onnx to 31ca96c - Microbenchmark for encoding+decoding ModelProto and GraphProto with a single operator (#609)
31ca96ca33
2018-03-15 03:21:09 +00:00
514f87a16c Define RPC types out of source (#5794) 2018-03-14 21:36:54 -04:00
1709484a40 Restore tensor.type, tensor.type_as docs (#5746) 2018-03-14 17:59:31 -04:00
bedba9c156 Fix unused parameter warning 2018-03-14 14:58:31 -07:00
af5bfa00a5 Fix unused parameter warning in THTensorMath.c 2018-03-14 14:37:41 -07:00
74f0b270ea Fixing conda (#2123)
* Fixing conda

* Adding hypothesis and onnx to conda builds

* Updates but still not working

* Adding required changes to conda_full

* Updates

* Moving to more general build_anaconda script

* Adding check for gcc version

* Adding general ways to add/remove packages from meta.yaml?

* Changes for specific packages to build on gcc 5.4

* Fix with glog spec

* Requiring >numpy 1.12 for python 3 to satisfy opencv dependency

* Adding pydot to required testing packages

* Adding script to read conda versions for gcc ABI

* Trying to fix segfault by installing in env instead

* conda activate -> source activate

* Trying adding back leveldb

* Setting locale for ONNX + conda-search changed its format

* read_conda_versions handles libprotobuf

* Conda script updates

* Adding a protobuf-working test

* Removing changes to proto defs b/c they will require internal changes in a separate diff
2018-03-14 12:24:37 -07:00
e40425fd9b Revert "Add torch.sparse_coo_tensor factory. (#5745)" (#5780)
This reverts commit 361baa5a48cb72a4f5e11508a963978edcd6cff9.
2018-03-14 13:30:52 -04:00
8a9925f03f Fix useless opset_import in onnx (#2243)
* Fix useless opset_import in onnx

* Set the default ir version in make_model

* Use the target_opset_version in Caffe2Frontend

* remove make_model from helper in caffe2.python.onnx
2018-03-14 10:17:32 -07:00
5022b32b62 Fix windowns build (#2261) 2018-03-14 09:49:49 -07:00
361baa5a48 Add torch.sparse_coo_tensor factory. (#5745)
Notes:
1) I didn't add Tensor.new_sparse_coo_tensor; it didn't seem particularly useful, but it's easy to add
2) This doesn't do the type inference, i.e. torch.sparse_coo_tensor(indices=LongTensor, values=IntTensor)
will return a sparse tensor corresponding to the default type rather than a sparse IntTensor.  We can add
type inference later when we add it to other factories.
2018-03-14 12:10:07 -04:00
e9fffb5579 use std:: math functions (#5773) 2018-03-14 08:56:10 -04:00
3f3b686056 Refactor run_test.py to pass all options, not just verbose. (#5760)
I need this because run_test is going to need to read other
options than just verbose when I implement JUnit XML dumping.
(JUnit XML dumping cannot be implemented solely by frobbing
--python because the XML file to dump to must vary based on the
test name.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-14 07:44:58 -04:00
cadeb0cb17 Revert "ATen ReduceOps (#5481)" (#5765)
* Revert "ATen ReduceOps (#5481)"

This reverts commit 310c3735b9eb97f30cee743b773e5bb054989edc.

* Revert "Check that new cpuinfo and tbb submodules exist (#5714)"

This reverts commit 1a23c9901dbfee295bf5b3dad36e4d3ee7e86366.
2018-03-13 23:50:16 -04:00
11056528d1 Fixes Variable::data() on UndefinedTensor (#5756)
The save_mean and save_std are undefined if training is false.
Previously, we unpacked them even though we did not use them in the
computation.

We also don't need to re-pack the mean/variance variables.
2018-03-13 22:22:20 -04:00
28eda01809 Reduce Sum and Reduce Mean (#2189)
* Reduce Sum and Reduce Mean

* Handle reductions with empty 'axes'

* Merge codebase and simplify tesnor reduction logic

* Restructure code and add comments.

* Fix parameter to scale

* Fix parameter to scale
2018-03-13 19:13:47 -07:00
dd921f65ba bump version to 0.8.2 (#2251) 2018-03-13 18:07:07 -07:00
0476a2346b Add symbolic for relu to support exporting to ONNX (#5759) 2018-03-13 20:20:39 -04:00
bab0f8484b Put torch header install back into the install command (#5755) 2018-03-13 19:23:02 -04:00
16fa12214d raise RuntimeError on test failure (#5754) 2018-03-13 18:53:43 -04:00
11444a7273 Save self.numel() for backward (#5747) 2018-03-13 17:45:29 -04:00
edd138ba00 [C2] Support optional lengths input to ReduceFront/Back operators (#2250) 2018-03-13 13:20:26 -07:00
effc568cee Add ReLU to ATen (#5626) 2018-03-13 19:23:24 +01:00
835b2ffd72 Warning police. (#5720)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-13 13:16:35 -04:00
76a283db40 [ready] General Documentation Improvements - 2 (#5685)
* Fix some minor errors in existing docs.

* Fix Convolution and Pooling docs in torch.nn.functional

* Cleaned up torch.nn.functional docs

* Address @SsnL 's comments

* Add multiplication sign missing in docs

* Fix more typos, and clear some warnings

* Change infinity symbol in LPPool2d

* Revert some changes in torch.nn.functional

* Few more minor changes
2018-03-13 09:47:43 -04:00
37059ba0ec Added torch.distributed.launch module for easier multi-proc/node distributed job launching (#5348) 2018-03-13 12:04:38 +01:00
f377159cc8 make dimension checker of scatter_add_ consistent with scatter_ (#5659)
* make dimension checker of scatter_add_ consistent with scatter_

* move TH_TENSOR_DIM_APPLY3_SIZE_SCATTER out of scatter and scatterAdd
2018-03-13 05:23:26 -04:00
025e43c263 Attempt to fix #5718. (#5726)
* Attempt to fix #5718.

* markdown fix for LPPool1d

* It's sum pooling, not average pooling.  (I both tested this and considered the math.)
2018-03-13 04:38:04 -04:00
f69fb3829a Add documentation for LPPool1D (#5730) 2018-03-13 04:37:25 -04:00
542fbcc127 Add optimization to norm for common norms (#5722) 2018-03-12 19:54:49 -04:00
55af142b44 Traceable dispatch for cast methods (#5629)
Previously, methods like int() and long() would fail tracing because they eventually dispatch down to toType, which takes a Type as a parameter. We don't (currently) support tracing ops with Type inputs[0], so this PR adds specializations for the ATen scalar types and dispatches to those directly. These specialized ops can be traced into the IR without needing a Type argument.

A more long-term solution would be to add support for Types in the IR.

* Traceable dispatch for Variable cast methods

* Add ONNX symbolics

* Fix test

* Fix cross-backend copy issue

* Prepend underscores to cast identifiers

* Metaprogram symbolics

* clang-format

* stupid lint

* Add comments for all code fragments
2018-03-12 19:01:14 -04:00
0919b5247d Fix at::optional return type in fusibleExpandTo (#5717)
* Fix at::optional return type

* More type-safe return expressions :)
2018-03-12 19:00:06 -04:00
7e6693991d Onnx caffe2 backend (#2039)
* C++ version of ONNX->Caffe2 backend

* use namespace ONNX_NAMESPACE

* Fix Build

* Comments

* Change namespace from onnx_caffe2 to caffe2::onnx
2018-03-12 15:18:05 -07:00
c7611f7608 improve occupancy for cuda rngs (#5710) 2018-03-12 16:21:01 -04:00
a2641500bf Implement torch.reshape and Tensor.reshape (#5575)
* Implement torch.reshape and Tensor.reshape

This implements reshape which has similar semantics to numpy.reshape. It
will return a view of the source tensor if possible. Otherwise, it
returns a copy.

* Remove in-place reshape_ that was an alias for resize_

* Update documentation
2018-03-12 16:20:40 -04:00
f6c708f869 Ensure torch.tensor and Tensor.new_tensor copy numpy data. (#5713) 2018-03-12 16:20:10 -04:00
1a23c9901d Check that new cpuinfo and tbb submodules exist (#5714) 2018-03-12 15:44:10 -04:00
b465bb9a8e fix post eos penalty (#2235) 2018-03-12 12:42:22 -07:00
4007dd76e2 Add missing ONNX symbolics and fix fusible expand logic (#5654)
This includes various fixes required to export the NMT decoder to ONNX

* Add missing ONNX symbolics and fix fusible expand logic

* Update comments and use of at::optional

* Use _unimplemented
2018-03-12 15:39:39 -04:00
602a09dde7 Update caffe2 from facebook 4f527ef46abf (#2234)
* [GanH]: two_task_discriminator

as titled

and adding label smooth

* [Dper2] Simplified UI options needed for blob magnitude visualization

* [GanH]: fix tags

as titled

* Added type and shape inference for GatherRange operator

This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.

* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python

We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.

* Bind Gloo IoException to IoError in Python

Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.

* [GanH]: add label smoothing to softmax with loss

as titled

* [C2] Enable LARS in Adagrad and hook it to DPER

* [DPER] Don't pass LayerModelHelper in create_trainer_nodes

Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.

* fix bugs in LambdaRankNdcgOp

the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.

* Restrict thread pool on iOS to only big cores

Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.

* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

* make clang happy and get fewer warnings

make clang happy and get fewer warnings

* [Personalization] Support add_output_schema() in layer_model_helper

Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.

Solution:
For flexibility, we want to add fields to output_schema incrementally.

Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.

Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer

Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
2018-03-12 12:22:59 -07:00
310c3735b9 ATen ReduceOps (#5481)
This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities.

The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc.

For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC.

There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc.

I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch.

Here is the command for 1 core
`OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200`

Here is the command for all cores
`python sum_bench.py --enable_numpy 200`

Here are the results of each:

[Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ)

[This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w)

[Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw)

[This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA)

To test the command is
`python sum_bench.py --test 200`

[This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw)

For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. 

In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-12 15:19:12 -04:00
4a96f5616c make CUDA_VERSION available in cudnn/Descriptors.h (#5709)
Otherwise hmma is not called in rnns on volta.
2018-03-12 14:41:00 -04:00
dc4984ef10 Delete ""_sym literal form. (#5707)
* Delete ""_sym literal form.

Two reasons:

1. It's unnecessary now; all of the uses of the literal form would
   be better directly referring to the interned string (esp. since
   now we are autogenerating symbols.)

2. When I add namespacing, there will be no convenient way to specify
   the desired namespace with just _sym.  If we add it back, we would
   need distinct suffixes for each different type.  Easiest to delete
   it while we don't need it.
2018-03-12 18:47:19 +01:00
42bf2f9289 Explain floating point issue in torch.arange doc (#5708)
* Explain floating point issue in torch.arange doc

https://github.com/pytorch/pytorch/issues/5556
https://github.com/pytorch/pytorch/issues/5704
https://github.com/pytorch/pytorch/pull/5600

* Add line break to stay below max comment length

* Copyedit

* Typofix
2018-03-12 12:04:51 -04:00
4b2d278968 check-in pytorch.version file to master 2018-03-12 10:07:57 -04:00
41285edbb6 [jit] add a compiled script module (#5630)
Add script::Module C++ class to represent script modules
switch AST -> IR conversion to work on Modules/Methods rather than raw graphs
function-only AST -> IR conversion is just a simplified case where there is
only one module with a single method and no parameters.
introduce SugaredValue in compiler.h to represent values in scope in a script
function that are not first-class and that get desugared. This is used to
represent the module's self parameter, as well as python function calls,
and method calls on tensor
provide a Python ScriptModule that provides a nice API on top of script::Module
allowing for the definition of script modules with methods, parameters,
and submodules
Not in this PR but intended for the future:

ScriptModule actually subclasses nn.Module, with most methods implemented
Unification of tracedmodule and script module functionality into one container class.

Detailed changelog:

* Switch compiler over to using Module, but don't
use them yet.

* Remove intermediate attribute encoding in compiler

* Create SugaredValue object to handle resolution
of compiled module.

* switch to_ir to modules, implement Select

* hacky python wrappers

* Private ScriptModule

* Add `define` to script module

* Attributes use TK_LIST_LITERAL

this anticipates adding a real list literal expression to the language.

* Add a metaclass to make sure script stubs are registered

* Add a test

* Doc createResolutionCallback

* Docs and minor editing

* Address PR comments

* Document

* Fix unicode issue
2018-03-12 09:52:40 -04:00
dede63689f Moved headers files copy for C++ extensions to build_ext in setup.py (#5706)
The header files needed for the C++ extensions were copied to
torch/lib/include under install. In case of bdist_wheel or build develop
for example, the files are not copied and cpp_extensions test is failing:

```
Running test_cpp_extensions.py ...
running install
running build
running build_ext
/home/moni/src/ibm/AI/pytorch/torch/utils/cpp_extension.py:79: UserWarning:
Your compiler (g++) may be ABI-incompatible with PyTorch.
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
  warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
building 'torch_test_cpp_extension' extension
creating build
creating build/temp.linux-x86_64-3.6
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/moni/src/ibm/AI/pytorch/torch/lib/include -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/TH -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/THC -I/home/moni/miniconda3/envs/pytorch/include/python3.6m -c extension.cpp -o build/temp.linux-x86_64-3.6/extension.o -g -DTORCH_EXTENSION_NAME=torch_test_cpp_extension -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
extension.cpp:1:25: fatal error: torch/torch.h: No such file or directory
 #include <torch/torch.h>
                         ^
compilation terminated.
error: command 'gcc' failed with exit status 1
```
2018-03-12 14:07:45 +01:00
f5a40a8b53 Fix error message (#5701) 2018-03-12 03:47:28 -04:00
1df99e541c Fixes for build errors on Windows with GPU (#2222)
* Fixes for build errors on Windows with GPU

* Typo
2018-03-11 15:44:14 -07:00
000edb791e Make use of new BUILD_ENVIRONMENT variable when possible. (#5699)
* Make use of new BUILD_ENVIRONMENT variable when possible.

Eliminate CI provided environment variables. At the moment, our build scripts depend on a few environment variables which are specified by the CI system and passed down to the build. Based on the build scripts, these environment variables are JOB_NAME, PYTHON_VERSION and GCC_VERSION; variables that depend solely on the image being built and the invoked script.
a. Proposal: A recent rewrite of the pytorch-dockerfiles has embedded a new environment variable, BUILD_ENVIRONMENT, which is automatically set when you run the Docker image. This environment variable subsumes JOB_NAME (this variable doesn't specify if you are “building” or “testing”, but this can easily be inferred from the script that is being invoked.) Make use of this environment variable to compute the other variables.


Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* syntaxfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* bugfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-11 18:14:49 -04:00
6404904d8a Fix run_test.py (#5693) 2018-03-10 19:16:40 -05:00
e9d1a5f6d5 support non-Variable arguments to functions in symbolic overrides (#5645)
simply pass them through unmodified. This is just the final tweaks,
after the bulk of the work getting rid of ExportProxy
2018-03-10 17:51:49 -05:00
261dd6ea83 fix named_modules doc, clarify eval doc (#5691) 2018-03-10 17:35:07 -05:00
15cc24a970 Minor improvement in AutoGPU usage in CUDA bindings (#5689) 2018-03-10 11:55:46 -05:00
248c93372d Check value type for register_buffer (#5657)
* Check value type when registering buffer

* Fix PEP8

* Use isinstance in favor of is_tensor
2018-03-10 13:02:04 +01:00
ec36e6f40a [auto] Update onnx to 79dc46f - Add ONNX_NAMESPACE around rnn/old.cc (#605)
79dc46fa4d
2018-03-10 06:35:48 +00:00
54aa28da73 Add shebangs to perf_test shell scripts (#5684) 2018-03-09 23:56:04 -05:00
dca41bb696 Minor fix to gen.py to make CPU-only generation cleaner (#5683) 2018-03-09 23:54:48 -05:00
4e190c2fed Fix floor latex rendering (#5682)
* Make floors larger

* Improve Latex rendering of floor

* Improve latex rendering of ceil

* Fix flake8
2018-03-09 23:53:14 -05:00
7368c09280 Add efficient isVariable test to ATen (Part 2) (#5675)
* Add efficient isVariable test to ATen.

This is done as a field on Type so that we can define a
non-virtual, inlinable function.  The added ASSERTs probalby
affect runtime performance, we may need to toggle them off
on non-DEBUG builds.

Fixes #4814.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Rebase and brush up

* is_variable -> is_variable_or_undefined
2018-03-09 23:52:54 -05:00
e6090403cb small fixes to CosineEmbeddingLoss tests (#5681)
* small fixes to CosineEmbeddingLoss tests

* fix test
2018-03-09 23:52:22 -05:00
439aae7e94 Add tensor.repeat docs. Remove legacy tensor repeat function. (#5666)
* Add tensor.repeat docs. Remove legacy tensor repeat function.

* Fix nit
2018-03-09 23:51:47 -05:00
b5ee5e585b Only allow dense floating-point types as the default tensor type. (#5674) 2018-03-09 23:50:18 -05:00
03f2ad9029 Add check for python build deps to setup.py (#5618)
* Add check for python build deps to setup.py

* Address comments

* Remove install_requires line
2018-03-09 23:49:18 -05:00
74043b69c2 Alias torch.diagonal, torch.diagflat (#5622)
* Alias torch.diagonal, torch.diagflat

* Address comments; Add sanity tests for torch.diagonal and torch.diagflat
2018-03-09 23:46:42 -05:00
7b61b458b1 Make torch.arange consistent with numpy.arange (#5600)
* Fix arange floating point error

* fix test

* add type cast when calculating arange size

* fix nit

* update test

* use doubles instead of floats to calculate size

* requested changes
2018-03-09 23:43:55 -05:00
eb34186104 [auto] Update onnx to 71fa008 - Provide option to enforce /MD or /MT when building with MSVC (#602)
71fa008efe
2018-03-10 03:22:11 +00:00
09b6ad5785 Use cpuinfo instead of Android's libcpufeatures in Android build 2018-03-09 22:20:37 -05:00
59d1d17775 Print source location when ONNX export fails for a node (#5652) 2018-03-09 15:31:28 -08:00
582d045092 Fix rrelu docs (#5678) 2018-03-09 23:33:20 +01:00
50770f0bc4 Fix Hardshrink equation in docs (#5679) 2018-03-09 23:32:52 +01:00
7391dae709 Fix Variable conversion on the way to/from Python (#5581)
* PyObject* <--> at::Tensor no longer unwraps variables, instead we expect end uses to always work with variable types, and we will only unwrap the variables when we optimize.
* Add torch::CPU, torch::CUDA and torch::getType
* at::CPU -> torch::CPU in extensions
2018-03-09 14:31:05 -08:00
0ee53bf7fe Fix one more naming issue in resnet50_trainer.py for PR 2205 2018-03-09 13:51:42 -08:00
ed05ca9fec Clean up naming of FP16-related code, add comments 2018-03-09 13:51:42 -08:00
b07980334c Update jenkins build script using the same flag as used in benchmarking (#1977)
* Update jenkins build script using the same flag as used in benchmarking

* Add a recently added flag

* Remove BUILD_OBSERVERS flag since it is no longer used
2018-03-09 13:44:41 -08:00
b543041e21 Corrected a typo in LSTM documentation. Fixes #5661 (#5662) 2018-03-09 22:03:25 +01:00
53876c4606 Rewrite run_test.sh in Python (#5615) 2018-03-09 22:02:02 +01:00
d4c0538be2 Add device to Tensor.new_tensor. (#5669) 2018-03-09 15:41:14 -05:00
ae0c04c773 Add torch.empty, torch.full and new_ size Tensor factory methods. (#5668)
* Add torch.empty, torch.full and new_ size Tensor factory methods.

This adds torch.full, torch.empty equivalents of np.full, np.empty.
In addition, this adds size-based Tensor factory methods new_empty, new_ones, new_full, new_zeros,
which is meant to complete the separation of the legacy "new" method into data-based and size-based
functions.

This also fixes an issue in sparse zeros_like when the dtype didn't match the argument dtype.

* Get rid of unnecessary zero in sparse tensor zeros_like.

* Fix test if only 1 cuda device.
2018-03-09 15:29:29 -05:00
60299e03cf Report all errors during ONNX backend translation rather than failing fast (#2210) 2018-03-09 10:58:22 -08:00
57e5559788 [auto] Update onnx to b184dd3 - Fix ONNX library build for Windows
b184dd3cb8
2018-03-09 18:45:24 +00:00
52460a0b30 Add outputs_info as parameter in run_node (#2161) 2018-03-09 10:44:51 -08:00
e4c303f373 Defer shape analysis failures until runtime (#5574) 2018-03-09 18:43:03 +01:00
27cb06ae22 Adding rewrite_net for ACL backend (#2186)
Add rewrite_net for ACL backend
2018-03-09 09:00:21 -08:00
063f066394 [auto] Update onnx to 0174eb5 - fix get_attribute_value can not get g field bug (#599)
0174eb51c8
2018-03-09 16:45:55 +00:00
a3442f62bc Support native namespace functions with type dispatch. (#5576)
* Support native namespace functions with type dispatch.

Use 'ones' as an example.  Note this is a "halfway" solution; i.e. the call chain is:
at::ones(shape, dtype) -> dtype.ones(shape, dtype) -> CPUFloatType.ones(shape, dtype) -> at::native::ones(shape, dtype)

The "nicer" solution would probably be something like:
at::ones(shape, dtype) -> dtype.ones(shape) -> CPUFloatType.ones(shape) -> at::native::ones(shape, this)

* Fix type inference.

* Fix test install.

* Fix extensions.

* Put dtype argument at the beginning.

* Fix extension.cpp.

* Fix rnn.

* Move zeros in the same manner.

* Fix cuda.

* Change randn.

* Change rand.

* Change randperm.

* Fix aten contrib.

* Resize in randperm_out.

* Implement eye.

* Fix sparse zeros.

* linspace, logspace.

* arange.

* range.

* Remove type dispatch from gen_python_functions.

* Properly generate maybe_init_cuda for type dispatch functions not named type.

* Don't duplicate dtype, this parameters for native type dispatched functions.

* Call VariableType factory methods from the base type so it gets version number 0.

* Address review comments.
2018-03-09 10:52:53 -05:00
037011e757 Avoid duplicated log when explicitly specified engine is not available (#2214)
* Avoid duplicated log when explicitly specified engine is not available

* Update operator.cc
2018-03-09 07:42:53 -08:00
b225893e2a update comments in segment_reduction_op (#2207)
* update comments in segment_reduction_op

* Update segment_reduction_op.cc
2018-03-09 07:42:36 -08:00
64b33672af add GatherFused8BitRowwise operator (#2167)
* add GatherFused8BitRowwise operator

* Update gather_fused_8bit_rowwise_op.cc

* Update gather_fused_8bit_rowwise_op.cc
2018-03-09 07:42:17 -08:00
632f8b5be7 fix comment on the location of scale and bias (offset) in each fused rowwise 8bit (#2166)
* fix comment on the location of scale and bias (offset) in each fused rowwise 8bit

* Update fused_rowwise_8bit_conversion_ops.cc

* Update lengths_reducer_fused_8bit_rowwise_ops.cc

* Update lengths_reducer_fused_8bit_rowwise_ops.cc
2018-03-09 07:41:59 -08:00
a33aeed1dc Add set_grad_enabled as context manager and function (#5555) 2018-03-09 11:36:56 +01:00
70fdeb8e07 [auto] Update onnx to 7e205b6 - Add global avg and max pool test cases (#574)
7e205b6619
2018-03-09 07:23:55 +00:00
f9f5946908 Fix variable shadow warning
title
2018-03-08 21:40:40 -08:00
f88bba1c73 Fix docker builds (#2199)
* Fix docker builds

* Guard more places

* fix bash syntax error

* .

* .

* quote
2018-03-08 20:09:38 -08:00
ff804ba168 [auto] Update onnx to 5516ebb - to_string for Android (#597)
5516ebb49f
2018-03-09 04:07:19 +00:00
71d73211f4 [ready] torch.* doc update for Variable/Tensor merge, and other improvements (#5443)
* 1. Update doc to reflect changes in Variable/Tensor merge, and new printing style
2. Remove functions in torch/functional.py that are already implemented with native_function
3. Add set_detault_tensor_type doc

* fix torch.split

* py2 unicode string fix

* update torch.gels doc

* address @fmassa 's comments

* double-colon
2018-03-08 23:02:38 -05:00
359d54ea97 Fix typo in CMakeLists build fix for Ninja (#2213)
The comment suggests that a special case should apply to either Ninja or Visual Studio, but the condition checks for both
2018-03-08 19:45:39 -08:00
8ab101ccee Implement pow() for integer types (#5526)
* CPU int-types pow()

* CUDA int-type pow()

* Cleanup + fix deleted line

* Tests for integer-types pow

* Fix build

* Fix windows tests

* Make _test_int_pow static
2018-03-08 22:33:32 -05:00
57c7d132c9 Fix nn.Module.apply doc formatting (#5623)
* fix nn.Module.apply doc example

* other examples' double-colon and newline'
2018-03-08 22:26:01 -05:00
f84fa526d3 Add additional deprecated overloads with out kwarg (#5643)
This improves backwards compatiblity with 0.3. It adds support for
the out kwarg for the deprecated overloads that have optional
positional alpha/beta/scale arguments.

The addcmul(self, value, tensor1, tensor2, out=self) syntax is used by
gpytorch.
2018-03-08 22:25:20 -05:00
8f068bd780 fix CUDA btrifact error message using wrong info type (#5644) 2018-03-08 22:21:26 -05:00
8ba8713f5d torch.load() / torch.save() support arbitrary file-like object (#5466)
* Test serialization file-like object API guarantees and update docs.

* Implement torch.load() / torch.save() for arbitrary file-like objects

* Add tests for torch.load/save for file-like objects

* Fix compiler errors

* Throw error if user tries torch.save(tensor, StringIO.StringIO)

* Skip test_serialization_container_filelike. Investigation pending.

* Address comments

* Fix _test_serialization_container

* Address comments

* fix comment

* Use PyBuffer_FromReadWriteMemory

* Fix build by removing inlining

* Fix clang builds?

* Address comments

* Don't use memoryview in python 2

* Ensure doRead/doWrite templates are instantiated before they're used in generic/serialization.cpp
2018-03-08 22:18:55 -05:00
7f44c0d011 rename onnx/utils/__init__.py -> onnx/utils.py (#5639) 2018-03-08 22:17:59 -05:00
b9cc035654 import torch.jit in torch/__init__.py (#5638)
previously, it was being implicitly imported via the import of
torch.onnx

this is no longer the case, and is a hacky thing to depend on anyway,
so import it explicitly
2018-03-08 22:17:47 -05:00
06df037d9a do away with ExportProxy hack in onnx export (#5614)
ExportProxy was a mechanism to reuse the code that supported exporting
autograd Functions to support overriding arbitrary python
functions. However, it had some serious downsides

- only works on some functions (all args must be Variable)
- complicated
- bad error messages in some cases

Instead, just expose enough functionality to python to perform the
necessary logic explicitly.
2018-03-08 22:17:30 -05:00
4aecbe0877 Give ATen/gen.py output directory option (#5653)
* Give ATen/gen.py output directory option

* Dont yield files

* os.path.join is too cross-platform
2018-03-08 22:08:50 -05:00
92596197fc add end to end test for DistributedDataParallel (#5182)
* add end to end test for DistributedDataParallel

* address comments

* skip subgroup tests when less than 3 processes

* set process number based on available gpus

* add single gpu;cleanup WORLD_SIZE

* fix comments
2018-03-08 22:07:34 -05:00
a268ed6588 fix momentum doc in IN andLN (#5649) 2018-03-08 22:01:56 -05:00
a3f463517e add gpu guard for broadcast_coalesce (#5655) 2018-03-08 21:59:19 -05:00
9acac2a513 Pass in task groups to PipedReaderBuilder (#2182) 2018-03-08 16:16:57 -08:00
4c4a42b3f9 implement CosineEmbeddingLoss as a native function and add reduce arg (#5646)
* implement CosineEmbeddingLoss as a native function and add reduce=True arg to it

* fix flake8

* address comments

* add reference function to tests

* fix flake8
2018-03-08 17:54:24 -05:00
807a4914c3 [auto] Update onnx to 728cc98 - Add outputs_info into run_node backend interface (#588)
728cc987af
2018-03-08 22:49:14 +00:00
f4b1e8b334 [Dper2] Add NetModifier abstraction and support for plotting the norm of blobs (#2201) 2018-03-08 13:41:32 -08:00
d90cd73aea [auto] Update onnx to b052fef - Fix node test name of Slice (#596)
b052feffab
2018-03-08 21:40:13 +00:00
396637cdd6 Python-free build of autograd + jit (#5356)
This PR adds the possibility to build the C++ parts of autograd and jit, with no dependency on Python.
The goal is to allow taking a PyTorch IR representation (a tree s-expr) and running it with provided inputs.

Prerequisite: build PyTorch so that codegen runs once.
Instructions:

cd tools/cpp_build
bash build_all.sh
This will build libtorchjit and torchjit_test in tools/cpp_build/build/torchjit-build. The latter basically runs the code in test_jit.cpp for now.

While writing the PR, it turned out that a few of Python.h includes were redundant. They were removed here (PyTorch tests still pass on my machine, we'll see CI).

* Introduce Python-free builds of autograd and jit

* Remove NO_PYTHON ifdef in functions/special
2018-03-08 15:13:10 -05:00
9de922991c Revert "implement CosineEmbeddingLoss as a native function and add reduce arg" (#5640)
* Revert "implement CosineEmbeddingLoss as a native function and add reduce arg (#5447)"

This reverts commit c16478fe3fb8842119438b8fd79d98c8f50ca688.
2018-03-08 14:07:17 -05:00
6c6d301e4e [auto] Update onnx to ec5f1d3 - Add option to use customized protoc (#594)
ec5f1d3813
2018-03-08 18:28:35 +00:00
32b3841553 [ready] General documentation improvements (#5450)
* Improvize documentation
1. Add formula for erf, erfinv
2. Make exp, expm1 similar to log, log1p
3. Symbol change in ge, le, ne, isnan

* Fix minor nit in the docstring

* More doc improvements
1. Added some formulae
2. Complete scanning till "Other Operations" in Tensor docs

* Add more changes
1. Modify all torch.Tensor wherever required

* Fix Conv docs
1. Fix minor nits in the references for LAPACK routines

* Improve Pooling docs
1. Fix lint error

* Improve docs for RNN, Normalization and Padding
1. Fix flake8 error for pooling

* Final fixes for torch.nn.* docs.
1. Improve Loss Function documentation
2. Improve Vision Layers documentation

* Fix lint error

* Improve docstrings in torch.nn.init

* Fix lint error

* Fix minor error in torch.nn.init.sparse

* Fix Activation and Utils Docs
1. Fix Math Errors
2. Add explicit clean to Makefile in docs to prevent running graph generation script
while cleaning
3. Fix utils docs

* Make PYCMD a Makefile argument, clear up prints in the build_activation_images.py

* Fix batch norm doc error
2018-03-08 13:21:12 -05:00
c16478fe3f implement CosineEmbeddingLoss as a native function and add reduce arg (#5447)
forward (new) [1.1905965859768912, 1.160144692985341, 1.1558120870031416]
backward (new) [1.9150976981036365, 1.9792822760064155, 1.8779143309220672]
double backward (new) [3.6898688060464337, 3.5784677929477766, 3.569505032035522]

forward (old) [3.2359962839400396, 3.275224728975445, 3.3409753759624436]
backward (old) [5.668679727939889, 5.722980880062096, 5.585088661056943]
double backward (old) N/A

* implement CosineEmbeddingLoss as a native function and add reduce=True arg to it

* fix flake8

* address comments

* add reference function to tests

* fix flake8
2018-03-08 13:15:12 -05:00
28b1c94f0f allow application of @symbolic decorators without circular imports (#5595) 2018-03-08 12:44:16 -05:00
08f9cad140 Fix typo (#5635) 2018-03-08 12:32:22 -05:00
cebf44e960 Element-wise tests now use or seeded with hypothesis (#2181)
* Updated all element-wise tests to use hypothesis testing or at least use hypothesis seeds

* Updated tests to add seed to sqr function
2018-03-08 07:51:45 -08:00
d812a196e7 . 2018-03-08 10:10:34 -05:00
c55dc983d9 Fix ninja build in setuptools 2018-03-08 10:10:34 -05:00
bdc63be9fd log INFO for not available engine only when engine was explicitly specified (#2187) 2018-03-08 06:49:14 -08:00
363de58a8b implement double backwards for MaxPool3d (#5328)
* implement double backwards for MaxPool3d

* change MaxUnpool3d to use same indices as MaxPool3d

* fix nits
2018-03-08 06:15:07 -05:00
04461fa289 Prefix DataLoaderIter with underscore to discourage subclassing (#5619) 2018-03-08 11:09:51 +01:00
8720d72d7c Fixing inconsistent docs (missing parameters docs). (#5620) 2018-03-08 10:42:40 +01:00
5450ef50ed [auto] Update onnx to 2edc1e7 - Handle situations where protobuf is built on the fly (#592)
2edc1e727b
2018-03-08 04:11:03 +00:00
280d51e324 Use Ninja build system in setup.py when available 2018-03-07 20:49:30 -05:00
bbc2c642c9 Use Ninja build system when available
When Ninja is installed, use it instead of Make for native builds and for Android cross-builds.
2018-03-07 20:49:30 -05:00
60aa8c793d Update caffe2 from facebook (#2178)
* [C2] Don't crash kernel in case of invalid shapes for ConcatOp

Enforce correctness of the shapes for input tensors so we won't access invalid index.

* [Caffe2] Add analytical performance counters to Dynolog

Initial diff for counting analytical flops and memory writes for C2 operators.

* BBoxTransform op: Handle RoIs from multiple images per batch

BBoxTransform op used during typical Faster-RCNN inference operates only on
RoIs from a single image (no batching). Adding support to handle that with an
optional output blob containing the batch splits (i.e., the number of RoIs
belonging to each item in the batch). The code is perfectly backward compatible
and shouldn't break any existing models..

* [mkl] Make MKL-DNN cooperate with memongered nets

C2's MKL-DNN implementation caches input dims and reuses intermediate and
output buffers across net runs, which prevents memonger from being used. This
may not always be useful since input dims may vary widely in many cases and
we'll end up reallocating anyway. Added an option to force reallocation when
memonger is used.

* [oncall] fix batch gather ops for empty input

still need to bisect for the breaking change, but this shall fix the case for empty input.

the error logging is like: https://interncache-ftw.fbcdn.net/t49.3276-7/23938497_293562711176943_6500112636590424064_n.txt?_nc_log=1

@[557759185:raychen] can you help to subscribe oncall from ads side. this may affect the Sigrid online trainer.

* optimize BatchOneHotOp

We want to iterate in row-major as opposed to column-major for better
locality.

* Supported exporting model with int blobs.

Supported exporting model with int blobs. Needed by condensenet.

* BoxWithNMSLimit op: Handle boxes from mutiple images per batch

Similar to D7135360. Added support for multiple images per batch in the op.
Takes an optional additional input "batch_splits" as output by BBoxTransform
op, and returns new batch_splits after applying NMS and filtering. Otherwise,
backward compatibility is maintained.
2018-03-07 16:41:22 -08:00
957ddb54d6 Fail fast in pytest (#2116) 2018-03-07 16:36:16 -08:00
c2721ab503 Add per-element unique op for CPU (#5503)
Questions/possible future works:

How to template-ize to extend support beyond LongTensor?
How to check if autograd works (and if not, how to add explicit gradient)?
CUDA support?
Testing command:
DEBUG=1 NO_CUDA=1 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build && DEBUG=1 NO_CUDA=1 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py develop && python3 test/test_torch.py

Partially fixes #2031

* Initial commit for unique op

* Working unique with test

* Make inverse indices shape conform to input

* flake8 whitespace removal

* address review comment nits

* Expose fn and add docs. Explicitly declare no gradients

* Trial generic dispatch implementation

* Add tests for generics

* flake8 whitespace

* Add basic CUDA error throwing and templateize set

* Explicit contiguous and AT_DISPATCH_ALL_TYPES return

* Remove extraneous numpy conversion

* Refactor out .data calls

* Refactored to variable return length API with wrapper fn as opposed to returning a 0-length tensor, per off-line reviewer comments

* Remove A

* Don't use hidden torch._unique() in test

* Fix documentations
2018-03-07 18:16:51 -05:00
0a18608b43 hacks to test exception handling and python operator backtraces
Add exception handling & re-throwing to worker threads of DAGNetBase
2018-03-07 15:09:17 -08:00
ab8498e5c8 Acl copy ops (#2158)
Copy op for ACL backend
2018-03-07 13:34:08 -08:00
0c6e843028 [caffe2] Add scopes into ONNX While op (#2149)
Summary:
Executing loop's body in a separate workspace, using WorkspaceStack to
support saving and reusing of workspaces

Test Plan:
python caffe2/python/operator_test/onnx_while_test.py

Reviewers: caffe2-review, jamesreed

Subscribers:

Tasks:

Tags:
2018-03-07 12:34:11 -08:00
9ebfece900 Update perf test baseline with every master commit (#5605)
* Update perf test baseline with every master commit

* Get perf test data from repo for local runs
2018-03-07 15:07:30 -05:00
eededd3f97 Move main reshape logic for easier reuse (#2122)
We'll want to reuse this logic for Int8 Reshape, but currently the code assumes
Input(0) and Output(0) are TensorCPUs, which may not be the case for a
subclass.
2018-03-07 11:32:39 -08:00
88883825e5 [auto] Update onnx to 910db3b - Minimally fix CMakeLists on Windows (#589)
910db3bcd9
2018-03-07 17:07:20 +00:00
fcaa3bf609 disable ibverbs build with env variable (#5513)
if the env variable is specified, use its value to determine what to do
otherwise use the heuristic we have (should_build_ib)
2018-03-07 11:18:48 -05:00
461e3e3ae0 Allow indexing tensors with both CPU and CUDA tensors (#5583)
* Allow indexing tensors with both CPU and CUDA tensors

* Remove stray import
2018-03-07 10:24:12 -05:00
a90b695590 Disallow num_workers > 0 for DataLoader on Windows (#5591)
Using DataLoader with num_workers > 0 is known to cause CUDA out-of-memory issue on Windows.

This issue has already been noted in #4092.
2018-03-07 10:21:03 -05:00
3bc90d471d remove legacy workaround for hinge embedding loss reference fn (#5596) 2018-03-07 10:20:08 -05:00
0f50ca0b48 Add reduce to functional smooth_l1 documentation (#5610)
This has been present in master since https://github.com/pytorch/pytorch/pull/3382 but the doc for the functional interface was not taken into account.
2018-03-07 10:16:40 -05:00
792daeb422 Enable documentation for C++ extensions on the website (#5597) 2018-03-07 14:07:26 +01:00
63b4694bb8 release() does not need to be virtual (#5594)
Only the destructor `~Retainable()` needs to be virtual.
2018-03-07 04:07:24 -05:00
5d74462891 [auto] Update onnx to 8bcecad - fix cast op type constraints. (#587)
8bcecad91c
2018-03-07 06:54:24 +00:00
4c0e0ebb4e [auto] Update onnx to 4e9d21b - travis tweaks to make sure the correct versions of python are installed (#584)
4e9d21b68e
2018-03-07 05:14:09 +00:00
c9cc514df4 Bump minimum CMake version to 3.2
CMake 3.2 is required to properly track dependencies in projects imported as ExternalProject_Add (BUILD_BYPRODUCTS parameter).
Users on Ubuntu 14.04 LTS would need to install and use cmake3 package for configurations. Users of other popular distributions generally have a recent enough CMake package.
2018-03-06 19:57:48 -08:00
dd1564b061 Caffe2 module update: move observers as well as binaries. (#2145)
* Caffe2 module update: move observers as well as binaries.

* Add threads linkage

* Add Threads dependency to public interface
2018-03-06 14:45:21 -08:00
cdd0febd86 Fix for a confusion around grammar of Maybe (#5593) 2018-03-06 23:05:20 +01:00
82bdc51dd1 Use operator.index to convert indices to Python int (#5582)
This makes ParameterList, ModuleList, and Sequential convert PyTorch and
NumPy scalars to integers. This matches the behavior of Python lists.
2018-03-06 12:41:23 -05:00
5597aba868 Add return statement to the JIT AST (#5578) 2018-03-06 13:14:53 +01:00
a6650f5664 Recompute captures after the parameter is updated (#5488) 2018-03-06 11:19:45 +01:00
7d141d4243 Changes done internally at Facebook (#2154)
f679c644e332 dzhulgakov [caffe2] Sync script - add ability to handle rebase conflicts
51729b061a15 dzhulgakov [caffe2] Changes done on GitHub
2018-03-06 01:23:54 -08:00
9395a26fe5 disable NetTest.ChainingForDifferentDevices which is broken 2018-03-06 00:33:11 -08:00
e3e9f91889 Fixed a typo in BoxWithNMSLimit doc.
Fixed a typo in BoxWithNMSLimit doc.
2018-03-06 00:33:11 -08:00
bec8923e02 [C2] Adding Clip Tensor by Scaling op
This op is used for gradient clipping to take care of exploding / vanishing gradients.

If original_norm is larger than the threshold,
then each element of the tensor is scaled by threshold / original_norm.
2018-03-06 00:33:11 -08:00
9f2a35ee8b [C2] Enable LARS on GPU [PR Patch #2115]
ATT https://github.com/caffe2/caffe2/pull/2115
2018-03-06 00:33:11 -08:00
56ac3ef180 Correcting size types in (Un)PackSegmentsOp
int32 is too small for large sequence & batch size.
2018-03-06 00:33:11 -08:00
4a4407337d Supported inplace arguments for norm_planar_yuv.
Supported inplace arguments for norm_planar_yuv.
2018-03-06 00:33:11 -08:00
ee33a24af2 avoid vector copy/destruction in *_dim_ helper functions
All of these take the dims by value. In a tight loop this is really
significant because of copy and free.
2018-03-06 00:33:11 -08:00
6b98315a28 [GanH] Model Test
as titled
2018-03-06 00:33:11 -08:00
16ba087b64 [oncall]fix unittest dper/layer_models/tests:utils_test
as titled -- fix offending diff D7091725 due to added debug_info in operator
proto
2018-03-06 00:33:11 -08:00
496c999f7d [core] NUMA-aware pinned allocator
Using cudaHostRegister/Unregister instead of cudaMallocHost to move memory to a
specific NUMA node
2018-03-06 00:33:11 -08:00
7d8188a4c2 fix invalid-null-argument UBSAN error math_cpu.cc
Exposed by UBSAN
2018-03-06 00:33:11 -08:00
b68e2786e0 fix invalid-null-argument UBSAN error in math_cpu.cc 2018-03-06 00:33:11 -08:00
14c47fb211 fix invalid-null-argument UBSAN error in math_cpu.cc
Add an if statement to check if the destination buffer is not nullptr.
2018-03-06 00:33:11 -08:00
80d0f5de93 [mobile][mpscnn] iOS11.3 interface update
data source change for MPSCNNConvolution
2018-03-06 00:33:11 -08:00
08bb6ae8bb Fix OSS build
Fix OSS build broken after D6946982 by adding CMake detection variable
(https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-gcc4.9-ubuntu14.04-build/1343/console)
2018-03-06 00:33:11 -08:00
9e71de398b [core] Graph-level NUMA awareness in Caffe2
Adding NUMA awareness through numa_node_id in DeviceOption. Blobs of operators
with numa_node_id are allocated on corr. memory banks, using CPU pools with
NUMA affinity set to run operators.
2018-03-06 00:33:11 -08:00
8b0b090ff1 fix Caffe2TensorToNumpyArray for py3
with python3 np.int defaults to int64.  This diff should fix it. I don't know if test exist for this function already, however following ASR test was breaking when i switch to py3

```
buck test caffe2/caffe2/fb/speech/asr_training/:tensor_parser_test
```
2018-03-06 00:33:11 -08:00
968ebb3b82 [GanH]fuse jsd with lr loss/xent
as titled
2018-03-06 00:33:11 -08:00
fe3c22cd24 [GanH/Easy]Fix blob dim
as titled
2018-03-06 00:33:11 -08:00
08dbd96642 Add TensorInferenceFunction for PowOp
Add TensorInferenceFunction for PowOp so that we can infer the shape and datatype of Pow output.
2018-03-06 00:33:11 -08:00
f2ec5b7b0e [DPER] Fix bug in uint8 quantization shortcut.
After D6953547 some of the blobs were no longer impacted by uint8 quanitzation,
but they would still generate operators expecting uint8 inputs and thus fail.

This diff is adding a temporal hack to avoid doing this quantization when layer
is not quantized.

Will fix it with switching to Net rewriting instead.
2018-03-06 00:33:11 -08:00
1f0a833d8e JSD fwd/bwd op
as titled
2018-03-06 00:33:11 -08:00
2d3aebd5fb fix bug for conv3d Op cpu
There is a bug in ConvOp. SetDeviceTensor function only copies data to tensor when the sizes of the two are different. In the 3d convolution case for video models, img_shape_device_ (NCTWH) is modified only in the first processed example, and for the following examples, it won't get updated, because img_shape_device_.size() == img_shape.size(). However, it should get updated for each example, because T is changing for different videos. It is the same with col_buffer_shape_device_.

In this diff, if any dimension of img_shape_device_ is different from img_shape, img_shape_device_ get updated.
2018-03-06 00:33:11 -08:00
4aded2f7c1 Add Numa support (#2152) 2018-03-05 23:30:20 -08:00
115579697e fix typo in previous cudnn fix 2018-03-05 21:13:03 -08:00
2b7f750992 Fix cudnn < 6 2018-03-05 21:13:03 -08:00
b4b2f0d2cc Work on fp16 conv op 2018-03-05 21:13:03 -08:00
72f259c84b Add C++ preprocessor define CAFFE2_USE_EXCEPTION_PTR to guard use of std::exception_ptr 2018-03-05 16:02:40 -08:00
5c769bd243 Update Python information shown in CMake summary (#2132)
* Do not show Python library in cmake summary as we no longer link with libpython

* Show python include dirs in cmake summary
2018-03-05 15:27:59 -08:00
7588893ce2 Some additional clean-ups (#5505)
- Remove some uses of mega-header THP.h
 - Use HANDLE_TH_ERRORS in functions that may throw
 - Move NumPy includes to common header
 - Delete unused allocator
2018-03-05 17:45:02 -05:00
a91b2ad85f Fix flake8 (#5573) 2018-03-05 17:28:12 -05:00
e7897a3dc7 [auto] Update onnx to a711252 - Recent CI changes have issues: revert them while fixing to unbreak CI (#583)
a71125280c
2018-03-05 22:03:57 +00:00
a2c3ffa5c7 Delete unused expand functions in TH/THC (#5533)
These are implemented as native functions in ATen
2018-03-05 16:17:28 -05:00
6aef608f10 Fix Out of Memory failure in test TensorTest.Tensor64BitDimension (#2114)
* WIP: Fix Out of Memory failure in test TensorTest.Tensor64BitDimension

* WIP: update warning message and wrap resize inside TensorTest.Tensor64BitDimension

* WIP: only catch exception which is related to out of memory

* WIP: add return in the out of memory exception
2018-03-05 10:26:05 -08:00
976aaa55aa Add at::optional from https://github.com/akrzemi1/Optional (#5530)
Add optional.hpp from https://github.com/akrzemi1/Optional

f27e79084a/optional.hpp
2018-03-05 12:32:23 -05:00
c7e69e9015 Test documentation build in CI. (#5492)
* Test documentation build in CI.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* bugfix.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-05 09:42:40 -05:00
1ff36bfd61 Missed Step (#5558)
I think you missed a step
2018-03-05 08:59:48 -05:00
8376e63738 fixed softmax support documentation (#5557) 2018-03-05 08:59:06 -05:00
c93076495d add: padding_value to torch.nn.utils.rnn.pad_sequence (#5540) 2018-03-05 11:39:03 +01:00
9213109e58 Modifications to improve readability of prof_dag 2018-03-04 17:06:11 -08:00
fb848311b9 Add .watchmanconfig to .gitignore so Atom/Watchman won't complain 2018-03-04 13:06:29 -08:00
66547ca061 Fix links in distribution docs (#5531) 2018-03-04 21:33:07 +01:00
abd8501020 Export MAX_JOBS for build_libs on WIndows (#5550) 2018-03-04 10:50:28 +01:00
6aeaa52476 Fixes #5542, api changes for output path on Windows (#5549) 2018-03-04 02:02:47 -05:00
4ad58e6278 Deterministically seed all ATen C++ tests. (#5545)
Hopefully this fixes the following assertion faiulre:

/var/lib/jenkins/workspace/aten/src/ATen/test/native_test.cpp:102: test:
Assertion `d5.matmul(d1).allclose(d5.view({24, 2, 3}).bmm(d1.view({1, 3,
1}).expand({24, 3, 1})).view({3, 2, 4, 2}))` failed.

(this error seems to only occur on ASAN tests...)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-04 02:01:34 -05:00
c713c667e0 Use fast integer division algorithm to avoid division ops inside kernels. (#5054)
* Use pre-computed offset increments to avoid int division inside kernels.

- OffsetInfo and OffsetIterator pre-computes the necessary coordinate
  change along each dimension, so that each successive offset can be
  computed using only addition/subtraction/comparisons.

- Added IntDivider which supports "magic division" for uint32_t, thus
  eliminating integer divisions altogether for offset calculation, as
  long as indices fit in 32 bits.

- In code paths with statically determined dimensions (Dims=1 or 2),
  kernel arguments now contain only the necessary data (instead of
  MAX_CUTORCH_DIMS of everything).

- Fixed index overflow errors: for tensors with >= 2G elements, we used
  to have incorrect results or an infinite loop inside the kernel.

TODO: The following pattern is broken for tensors with >= 2G elements.
      It will result in overflow, even if IndexType is uint64_t.  Need
      to search and replace them.

  > for (IndexType linearIndex = blockIdx.x * blockDim.x + threadIdx.x;
  >      linearIndex < totalElements;
  >      linearIndex += gridDim.x * blockDim.x) {

* Update CMakeLists.txt

* Removed OffsetIterator, and kept only the fast integer division logic.

- Also changed canUse32BitIndexMath so that the max index for 32-bit
  math is INT32_MAX, instead of UINT32_MAX.  It also simplifies the
  division operation.

* Merged OffsetInfo into THCTensorInfo.cuh.
2018-03-04 00:39:57 -05:00
15eae9543e Fixed dimensions in docs of conv and conv_transpose (#5543) 2018-03-03 05:49:01 -05:00
37dec493a5 Scope MultiRNN blobs with name as well as layers (#2025)
* Scope MultiRNN blobs with name as well as layers

Also don't double scope MultiRNN in case of multiple layers.

* Scope input projection of first layer with name

We don't scope it with layers because the projection is done
outside of the layer.

* Avoid scoping input blob in MemongerTest.test_rnn

* Rectify input_blob in prepare_input

Revert change in memonger_test because rectifying input will solve the problem.
2018-03-02 22:21:07 -08:00
18a76f54a6 add concrete example for python_default_init in native functions doc; (#5538) 2018-03-02 23:05:06 -05:00
d013e16cf4 [C2] Enable LARS on GPU (#2115) 2018-03-02 18:06:19 -08:00
7cb2863e80 Merge branch 'master' of https://github.com/caffe2/caffe2 2018-03-02 17:34:51 -08:00
5f8029f90b Fix documentation for WeightedSumReducerDef
Summary: Fix documentation for WeightedSumReducerDef to be more general since it applies to both Sparse and Dense ops

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
2018-03-02 17:14:07 -08:00
e026cb1854 Fix build with ComputeLibrary on ARM64 (#2124) 2018-03-02 16:48:57 -08:00
0bce97b101 [auto] Update onnx to 8bbeb2a - Improve CMakefile of ONNX (#563)
8bbeb2ae45
2018-03-03 00:45:58 +00:00
fdfe1d09a0 Explicitly require listing additional libraries if a binary needs things beyond Caffe2_MAIN_LIBS (#2110) 2018-03-02 16:29:13 -08:00
11a736b682 Sqrt op (#2101)
* First attempt on sqrt op

* Adding the Sqrt op along with the test cases

* Made changes per @Yangqing's questions re: tensor format and used hypothesis to generate input tensor
2018-03-02 16:19:45 -08:00
349238f5bf Mean Op (#2072)
* Mean Op

* Mean Op

* Mean Op

* Fix gradients and include seed for randomized input generation

* Update test strategies parameters
2018-03-02 16:18:17 -08:00
558e2a92df Revert update on top_k_op (#2119) 2018-03-02 16:07:45 -08:00
fc7ee0c941 Use NEON for Android build 2018-03-03 01:04:56 +01:00
01e261c25b Update perf number for test_gpu_speed_word_language_model (#5529) 2018-03-02 18:53:12 -05:00
c70beed31c Add axis to top_k_op. 2018-03-02 15:21:31 -08:00
60415cf0d2 Big batch of fixes for JIT (#5517)
* Check if node output matches in shape propagation

* Fix list attributes and view shape propagation

* fix inferred shapes for view

* Fix shape inference for integrally typed tensors

* Fixes for concat in control flow

* Fix print
2018-03-02 15:03:44 -08:00
b5a3894c61 [auto] Update onnx to 679d70e - temporarily disable python3 on osx for travis (#579)
679d70e30a
2018-03-02 22:40:42 +00:00
f76fc6fa19 Update locally_connected_op (#2113) 2018-03-02 14:05:52 -08:00
2d4212274e Add typing dep to ATen standalone .travis.yml (#5527) 2018-03-02 16:44:56 -05:00
ec3c299baf Turning off conv_op_test for now (#2104)
* Skipping conv_op_test

* Adding todo
2018-03-02 11:36:08 -08:00
72d5d9016a move -s to CMakeLists.txt 2018-03-02 20:31:33 +01:00
24dee1515c add a rule back for non-Android platforms
`-DBUILD_TEST=ON -DBUILD_BINARY=ON -DUSE_OBSERVERS=ON -DBUILD_OBSERVERS=ON`
should work for both Andorid and non-Android platformas (e.g., Ubuntu)
2018-03-02 20:31:33 +01:00
fab2c07af9 make -DUSE_OBSERVERS=ON work
```
scripts/build_android.sh -DBUILD_TEST=ON -DBUILD_BINARY=ON  -DBUILD_OBSERVERS=On -DUSE_OBSERVERS=ON
```
2018-03-02 20:31:33 +01:00
9befaf14ea fix -DBUILD_TEST=ON -DBUILD_BINARY=ON for Android
make
```
./script/build_android -DBUILD_TEST=ON -DBUILD_BINARY=ON
```
work
2018-03-02 20:31:33 +01:00
ca90d4c356 Add -s for Android back
Android libraries are statically linked, we'd better strip binaries
2018-03-02 20:31:33 +01:00
54b4cdeffa Replace all uses of 'Tensor or Variable' with 'Tensor' (#5508)
Replace all uses of 'Tensor or Variable'  and 'Variable or Tensor' with 'Tensor'
2018-03-02 14:26:11 -05:00
806239d6bd Fix a bug gen_jit_dispatch.py (#5518)
* Fix a bug gen_jit_dispatch.py

The `fromLast` function is confusing to understand since `fromLast(stack, 0)`
was actually invalid whereas `fromLast(stack, 1)` was the last element.
This created off-by-one bugs in gen_jit_dispatch for some operators.

This changes it to `peek(stack, i, N)` which treats the last `N`
elements of the stack as a list, and extracts element `i` of that list.
This usage reflects how `fromLast` was actually being used in the code.

`peekSlice(stack, i, len, N)` similarly treats the last N elements
as a list but extracts a slice. This enables use to get rid of
drop calls and simplify the dispatch logic.
2018-03-02 10:32:02 -08:00
58cd133f7e Avoid OOM when running ASAN by splitting nn tests. (#5523)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 12:21:15 -05:00
f064c5aa33 Expunge all occurrences of torch._C._VariableFunctions (#5525)
Some of the call-sites now look a little hokey with this
removed, saving that for another patch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 12:19:44 -05:00
8f627fc658 CharTensor should be signed (#5512)
CharTensor is actually int8_t which is signed
2018-03-02 11:34:10 -05:00
a6520d6b98 Changes to ATenOp CMake to make it compatible with BUCK (#2111)
* Add TARGETS for ATenOp (hackily)

This is the best way I could figure out to hook up custom_rule. See https://fb.prod.facebook.com/groups/fbcode/permalink/1810939952287945/ for more details on why it's tricky.

As for the fix with SparseTensor - it seems to be a bug in ATen declarations introduced recently.

* cmake fixes
2018-03-02 07:58:13 -08:00
0877558e60 Port cuDNN RNN dropout state initialization to ATen and make Python c… (#5383)
* Port cuDNN RNN dropout state initialization to ATen and make Python code use it.

Fixes #5138.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Variable/Tensor bugfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 10:00:00 -05:00
dda4bdd596 Reduce OS X MAX_JOBS because we are still OOMing (#5493)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 09:38:19 -05:00
70ba50c3d4 Remove some uses of torch.is_tensor in favor of isinstance (#5473) 2018-03-02 06:17:38 -05:00
5dedc648bb Compile DataLoader.cpp separately (#5507)
Don't #include DataLoader.cpp in Module.cpp
2018-03-02 05:54:33 -05:00
df88373f88 set default ams param in adam optimizer (#5501) 2018-03-02 11:43:06 +01:00
bbad9e7c8a Add virtual destructor to SourceLocation (#5516) 2018-03-02 10:06:04 +01:00
b7ab3ff5e3 Change caffe_add_linker_flag to caffe2_interface_library (#2109)
* Remove OpenGL code from benchmark

* Update function name since it is renamed in other places
2018-03-01 21:45:44 -08:00
1af7df6e78 fix rnn_cell_test in fbcode (#2107) 2018-03-01 21:02:52 -08:00
acd8dfdfb9 Warning if engine is not available (#2106)
* Warning if engine is not available

* Update operator.cc
2018-03-01 19:11:21 -08:00
1981557751 Add README and ONNXOpCoverage doc back (#2102)
* Add README and ONNXOpCoverage doc back

* Polish the coverage table again

* Remove onnx-caffe2 from title
2018-03-01 17:05:25 -08:00
0de5443469 Reorganize interned strings into categories, autogen ATen strings. (#5471)
This also starts generating dispatch code for __and__ and similar
variants.  I was too lazy to see if we have committed the '__and__ is
not inplace' mistake other places.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-01 19:51:04 -05:00
7cdc272224 [auto] Update onnx to 27b4022 - osx travis support (#566)
27b40225ea
2018-03-01 23:51:35 +00:00
aa5145bf14 Enable onnx backend test on pow, ceil and floor (#2103) 2018-03-01 15:33:58 -08:00
27265503ad nn.* doc update after Variable/Tensor merge (#5459)
The nn.* counterpart of #5443 . Mostly removed Variable wrapper. Also added doc for nn.RReLU.

Notice that torch.randn(*, requires_grad=True) isn't documented until #5462 is done.
2018-03-01 18:11:39 -05:00
1ae884ff86 Add pytorch-docker-build-test. (#5468)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-01 17:09:54 -05:00
c0304c83b1 Copy some outputs in order to decouple storage (#2105)
so that mutating one of them does not mutate the others
2018-03-01 13:25:31 -08:00
a5e1b4efc9 Fix warnings in jit (#5499) 2018-03-01 15:15:35 -05:00
56096c2311 Building rocksdb as a module (#2094) 2018-03-01 12:01:44 -08:00
4a50ab0fdb Fix naming issue in TensorCompare.cpp (#5498) 2018-03-01 14:55:25 -05:00
b38ed69441 Delete unused files (#5500) 2018-03-01 14:28:06 -05:00
285a9e2452 Add dtype to torch.Tensor constructors and accept them in set_default_tensor_type (#5444)
* Add dtype to torch.Tensor, torch.FloatTensor, etc.

* Support passing dtypes to set_default_tensor_type.

* Check dtype exception.

* Correctly handle new type initialization order.

* Move handling of torch.Storage alias to C++.

* Delete function that erroneously reappeared.
2018-03-01 14:06:55 -05:00
b69b885e82 cuDNN 7.1 fix. (#5439)
Output from cudnnGetFilterNdDescriptor has changed in cuDNN 7.1 this fix will be forward and backward compatible.
2018-03-01 12:23:11 -05:00
9235277dba Re-enable some CUDA tests on Windows (#5446)
This PR enables the following tests on Windows again:

CUDA HalfTensor tests in test_torch.py and test_nn.py
test_Conv2d_deterministic_cudnn in test_nn.py
test_*Tensor_qr_big in test_cuda.py

The issues are no longer reproducible, possibly because of an upgrade to the display driver.

* Reenable CUDA HalfTensor tests on Windows

* Reenable test_Conv2d_deterministic_cudnn on Windows

* Reenable test_*Tensor_qr_big on Windows
2018-03-01 12:21:17 -05:00
8c6c09ad41 Adding openmpi to all conda builds (#2089)
* Adding openmpi to all conda builds

* Typo and turning off quiet

* Removing openmpi from non_cuda conda build

* Actually openmpi is already in the images
2018-03-01 09:19:34 -08:00
ef0ef70cf5 Don't spuriously raise warning for Constant nodes, fixes #5101 (#5469)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-01 12:08:48 -05:00
5e188e4182 Refactor and simplify ATen dispatch (#5475)
Simplifies type dispatch options to consistent use of macros (not macros here and functions there),
Adds the dispatch header to ATen/ATen.h so that users (e.g. writing extensions) can dispatch too.

* Refactor and simplify ATen dispatch

* cuda/Dispatch.h -> cuda/Dispatch.cuh

* Change dispatch strategy for half

* Use __VA_ARGS__ and get rid of parantheses

* Remove rogue UnderlyingType.h

* Fix TensorCompare.cu and add comment

* Include CUDATensorMethods in TensorCompare.cu

* to_cuda_type -> cuda::type and move AccumulateType out of native
2018-03-01 12:02:01 -05:00
b1dec4a74f Fix doc-push (#5494) 2018-03-01 17:37:30 +01:00
da894901ef Deprecate variable factory, use torch.tensor instead (#5476)
* Remove usages of torch.autograd.variable; use torch.tensor instead.

* Deprecate torch.autograd.variable.

* Remove unused sample_scalar.
2018-03-01 10:58:16 -05:00
7b33ef4cff Documentation cleanup for activation functions (#5457) 2018-03-01 14:53:11 +01:00
72aa83d702 Add pytest ccache into git ignore (#2095) 2018-03-01 00:09:33 -08:00
c96338ee2c Fixing mkl builds not using mkl (#2093) 2018-02-28 23:59:01 -08:00
4afd62db09 Add TracedModule to the JIT (#5409) 2018-02-28 22:50:50 -08:00
544aeaec62 Refix the linkage condition (#2091)
Merging as the test failure is a known issue and it's not relevant (linux vs mac).
2018-02-28 22:44:13 -08:00
2ad242bee9 Update Dependencies.cmake (#1920)
force find_package first to find OpenCV 3 when we have default package OpenCV 2 installed.
2018-02-28 22:22:25 -08:00
fcde409166 Fix the pybin11_state_gpu.so linking issue (#2087) 2018-02-28 22:19:57 -08:00
e03d74c40e [auto] Update onnx to ee79865 - Clarify reshape behavior when '0' is passed in (#569)
ee7986538a
2018-03-01 04:18:25 +00:00
55c64e5243 Add Python function calls to JIT script (#5445)
* Add Python function calls to script
* Script compiler gains a `Resolver` object that runs when it does not understand a function call. This decouples the python resolution from the conversion to IR.
2018-02-28 19:45:04 -08:00
b10fcca5f0 Install cuda headers in ATen build (#5474) 2018-02-28 19:36:41 -08:00
771791fe2f install pytorch into default conda env (#5482) 2018-02-28 21:42:38 -05:00
c3e4d7ff87 Cuda full (#2084)
* Removing leveldb for ubuntu

* changes

* Removing ibverbs

* Moving cuda_full back to default channels for gcc >5
2018-02-28 18:21:00 -08:00
39608b0180 Add source information to IR nodes (#5449)
* Add source information to IR nodes

SourceRange information from the script is not propagated to IR nodes.
This information is only used in two places now: the interpreter
wraps errors that occur when an instruction executions and shape
propagation now reports errors on the line where it fails:

    Traceback (most recent call last):
      File "test/test_jit.py", line 1655, in test_script_error
        bar(Variable(torch.rand(10), requires_grad=True), Variable(torch.rand(9), requires_grad=True))
    RuntimeError:
    The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0:
    @torch.jit.script
    def bar(c, b):
        return c / b
               ~~~~~ <--- HERE

In the future, shape propagation should really not report any size
errors and instead just not propagate shapes and let the actual
execution fail. However, this is hard to accomplish while we still
depend on running the op to do shape propagation.
2018-02-28 17:06:18 -08:00
36abf023bd Added 3d grid sampler (for volumetric transformer networks) (#5453)
* add 3d grid_sample

* add cuda implementation, more testing
2018-02-28 19:32:15 -05:00
7772d26cb0 Fix test sparse (#5478) 2018-02-28 16:05:50 -08:00
749a17661c Introduce padding op to mimic pytorch semantics in ONNX export (#2069)
In pytorch, after pad_packed_sequence, the "extra" elements (after the
ends of the sequences) are reset. In the equivalent Caffe2 graph
exported via ONNX, they contained some leftover values, which caused
tests to fail. Probably no one depends on these values, but just in
case, set them to zero to mimic pytorch semantics.
2018-02-28 15:44:54 -08:00
377d896969 better solution for the linking error related to lazy_init for MSVC (#5375)
* Revert "Fix wrong argument name (#5366)"

This reverts commit cc9d3b265d7e688865fde055ee3a2f9b77b5714a.

* Fix wrong argument naming

* Revert "Wrap torch::cuda::lazy_init with WITH_CUDA flag"

This reverts commit a8fa37f8fac5aef09eb7fe54d84de6126618c262.

* Revert "Solves the linking error related to lazy_init for MSVC"

This reverts commit 63913a102f274865a76e7c40ffdf6b40c277d5ff.

* better solution for the linking error related to lazy_init for MSVC

* Naming changes

* Namespace changes and further comment

* Rebasing onto current master

* Remove code that is useless

* Fix linting

* Remove rebasing bugs
2018-02-28 17:34:34 -05:00
5c381bbc57 Patch cuda-convnet2 from internal Facebook changes.
* Unfortunately this needs to be manually monkey patched.
* This should get it so GitHub and fbcode versions match.
2018-02-28 14:20:48 -08:00
ea10b7bc63 [auto] Update onnx to 4f00542 - Create unique proto filename based on ONNX_NAMESPACE (#555)
4f00542fc1
2018-02-28 22:08:33 +00:00
509aed6ca3 More Variable/Tensor clean-ups (#5464) 2018-02-28 16:46:47 -05:00
e91560017d Removing leveldb for ubuntu (#2081)
* Removing leveldb for ubuntu

* changes
2018-02-28 12:35:51 -08:00
eb612b09e9 Fix Caffe2 ==> ONNX converter to handle three models (#2058)
* Handle legacy pad in Caffe2==>ONNX converter, also remove fake initializer

* Address the comments, 1) have filtering fake initializer before ssa rewrite, 2) polish the legacy padding handling logic

* Add test cases to cover the code just added

* Nit
2018-02-28 11:55:49 -08:00
0f86f64398 Add support for device python arguments with constructors. (#5384)
* Add support for device python arguments with constructors.

* Fix flake8.

* Simplify device handling.

* Dont use torch._C._VariableFunctions.

* Handle default values for functions that have tensor args (e.g. ones_like).
2018-02-28 14:41:57 -05:00
459dadf04d Use 'Tensor' instead of 'Variable' in type error messages (#5465) 2018-02-28 14:35:12 -05:00
ebd32f7bcd Check that parsed_args contains enough space for all parameters (#5467) 2018-02-28 14:34:04 -05:00
687de0bd67 [auto] Update onnx to 8dc7369 - preserve value infos if they are needed (#561)
8dc7369bb9
2018-02-28 18:32:23 +00:00
6ab33a820c Support type conversion via type(dtype). (#5441)
* Support type conversion via type(dtype).

* Merge overloads.
2018-02-28 13:05:38 -05:00
94938be367 Support dtypes in legacy new constructors. (#5343)
* Support dtypes in legacy new constructors.

* Add comment about why we don't have dtype for sparse (indices, values).

* separate legacy tensor ctor vs new (new includes dtypes).

* Use TypeError.
2018-02-28 12:52:11 -05:00
2de0bb3df4 [minor] change test name 2018-02-28 06:02:27 -08:00
e09fa090e6 dont test for SourceChangeWarning in incompatible environments (#5458) 2018-02-28 09:00:28 -05:00
3d070e78fe Fix cmake dependency error in static library case. Peer coded with @bddppq (#2078)
* Fix cmake dependency error in static library case. Peer coded with @bddppq

* Temporarily add back the private dependencies to the binary targets
2018-02-28 01:33:59 -08:00
847fad70a9 Check if CXX compiler supports all the needed functions (#5401)
* Check if CXX compiler supports all the needed functions

This commit improves the code for PR #5230 according to
@ezyang comments. Instead of checking ubuntu/gcc versions it
checks the support for the needed functions from the C++ compiler
using CHECK_CXX_SOURCE_COMPILES.

Fixes: 5229
2018-02-28 00:22:34 -05:00
6f9dc115e8 Mark test_fs_sharing as hanging in ASAN. (#5451)
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-28 00:15:53 -05:00
35c3b91f8a Remove no longer used flag (#2075) 2018-02-27 21:07:25 -08:00
178c4be295 [wip] Cmake modernization (#2066)
* cmake target - work in progress

* wip cmake public targets

* Add missing INTERFACE keyword

* Add cuda public dependencies

* Add dependency for test targets
2018-02-27 20:42:37 -08:00
6341a0fd79 Fix cuda full (#2070)
* Trying fuller cuda_cull

* changes

* Migrating to conda-forge for openmpi

* Adding openmpi

* Adding leveldb

* Fixing unrelated minor conda bug

* Another unrelated fix
2018-02-27 16:26:41 -08:00
e07083f00a Cleanup CMake files and build scripts for Android (#2067)
- Remove USE_ARM64 option because it doesn't do what is expected
- Disable ARM ComputeLibrary for non-ARM/ARM64 builds
- Remove analysis of CMake options from scripts/build_android.sh
- Add user-specified CMake options at the end of command line to allow overriding defaults
- Update README for ARM ComputeLibrary integration and do not require to disable NNPACK for ARM64 build with ARM ComputeLibrary
2018-02-27 16:05:21 -08:00
5bbeb55f22 add reduce=True arg to MultiMarginLoss (#5150)
* add reduce=True arg to MultiMarginLoss

* Change tests to support legacy

* fix flake8

* address comments

* formatting change

* remove free of unallocated tensor

* fix after variable/tensor merge
2018-02-27 18:35:50 -05:00
392fc8885c add faq on cuda memory management and dataloder (#5378) 2018-02-27 18:35:30 -05:00
1a7815e662 Add Scalar to native_function.yaml doc (#5416)
* add Scalar to native_function.yaml doc

* address @gchanan 's comments
2018-02-27 18:34:42 -05:00
0955e791d3 Fix caffe_add_whole_archive_flag in cmake (#2062) 2018-02-27 15:04:01 -08:00
48a3349c29 Delete dead Tensor code paths (#5417)
This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp.

This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.
2018-02-27 17:58:09 -05:00
7276432bbd [auto] Update onnx to eb55f2a - Update defs.cc to clarify Pool op semantics (#552)
eb55f2a637
2018-02-27 22:08:17 +00:00
fa24e47d1a [auto] Update onnx to 296953d - spelling pass for docs (#542)
296953db87
2018-02-27 21:39:41 +00:00
6b95ca4eda DataParallel: GPU imbalance warning (#5376) 2018-02-27 21:30:41 +01:00
5e0a3e99bc OSS Playground modulelized model components (#2059)
Relatively independent feature.  Tested and reviewed.  Should be save to merge
2018-02-27 12:27:41 -08:00
d5038309a1 Remove WITH_SCALARS, as it's enabled by default now. (#5437) 2018-02-27 14:51:11 -05:00
76304300a8 Transpose shape inference (#2057)
* fix variable name

* enhance shape inference to handle transpose

in the case arising from pack_padded(..., batch_first=True)
2018-02-27 11:51:10 -08:00
af78d51dea [auto] Update onnx to 61836da - Check whether perm exists before using it (#559)
61836da46e
2018-02-27 18:50:46 +00:00
8327982904 Set python random seed in workers (#5415)
* Set python random seed in workers

* Import random
2018-02-27 03:16:10 -05:00
9f2975e2cf Remove onnx-caffe2 (#5425)
* Remove onnx-caffe2

* Comments
2018-02-27 03:15:49 -05:00
d5de0dca38 fix crash in cudnn setup helper on machines without cudnn (#5427) 2018-02-27 03:15:23 -05:00
7f1b3d12e1 Fix ASAN alloc-dealloc-mismatch in TestMultiprocessing (#5428) 2018-02-27 03:14:52 -05:00
38fb8c5cf7 Remove onnx-caffe2 reference (#2063) 2018-02-27 00:09:03 -08:00
a12aae2a72 [auto] Update onnx to b0ffb2d - Remove onnx-caffe2 reference (#558)
b0ffb2d302
2018-02-27 06:10:57 +00:00
bc4c919a9e update dependencies (#5423)
On OS X from source I get    `Missing build dependency: Unable to import the typing module. `
2018-02-26 22:43:42 -05:00
12a477b12e Update README.md 2018-02-26 18:21:24 -08:00
ddf6b3daae [auto] Update onnx to 176e357 - adding tests for cast operation (#543)
176e3575ea
2018-02-27 02:03:42 +00:00
679232657d Update README.md 2018-02-26 18:02:28 -08:00
ec194f2468 Fix typos in README 2018-02-26 18:01:27 -08:00
8c18220a59 Fix layer_norm initialization and nn.Module docs (#5422)
* Fix LN initialization; Support single int normalized_shape

* disable docstring inheritance

* fix sphinx warnings
2018-02-26 19:32:08 -05:00
611c771fc8 Introduce torch.tensor (was torch.autograd.variable). (#5419)
* Introduce torch.tensor (was torch.autograd.variable).

* Get rid of torch.variable usages.

* Use more precise name.
2018-02-26 19:10:29 -05:00
05269b582b [JIT] Support shape propagation with control-flow (#5391)
Support shape propagation with control-flow

* This allows us to enable optimization in the GraphExecutor for most
  script tests.
* Changes Type to always be present (non-null) on a Value, removing `hasType()`
  and `typeOption()`. A new type kind 'DynamicType' now represents when
  a specific type has not been determined.
* If/Loop nodes propagate shapes/types in the simple cases where types of
  outputs do not change depending on where control flows. In other
  cases, we propagate DynamicType to indicate we do not know what
  the shape will be.
* Remove the `cond` input to the body of Loop to simplify handling in
  interpreter and shape propagation.
* Bugfix for zero-dim contiguousStridesOf
2018-02-26 15:24:05 -08:00
0250b57978 Avoid extra cpu->cpu copy in dispatch_type. (#5418)
* Avoid extra cpu->cpu copy in dispatch_type.

* Simplify cases.
2018-02-26 17:56:45 -05:00
3600e9ef6c Mark functions that shouldn't end up in torch. as method-only. (#5392) 2018-02-26 16:52:54 -05:00
c6d47f6386 add @torch.jit.script, @torch.jit.compile, torch.jit.CompilationUnit(str) (#5367)
* torch.jit.trace annotation now creates a GraphExecutor

The other torch.jit.trace, which was used for testing purposes and for onnx to get the trace graph, is now called torch.jit. torch.jit.get_trace_graph.

* @script annotation, and compilation unit for strings
2018-02-26 13:22:45 -08:00
c3320887fe [cmake] try removing caffe2_include_directories hack (#2050) 2018-02-26 12:45:06 -08:00
406c9f9c28 Remove two uses of the old Tensor class (#5413) 2018-02-26 15:00:51 -05:00
ec547ce640 RNN ONNX export: concat hidden/cell states on the right axis (#2055)
Test Plan: existing tests in onnx-fb-universe catch this, modulo a bug
in the tests which I am fixing in a separate diff
2018-02-26 11:04:04 -08:00
e68b815afe Empty sparse tensor copy revers dimI, dimV. (#5414) 2018-02-26 13:54:20 -05:00
c7a3b00bf3 Back out "[caffe2] fix signed-integer-overflow UBSAN error"
Original commit changeset: 89c604e11ad4

Needed to back out D7006399 because of test failure. The previous diff changed behavior.
2018-02-26 10:26:25 -08:00
028bc2f23f [C2 OSS][GPU]exposing totalGlobalMem info to workspace python
exposing totalGlobalMem info to GetDeviceProperties method so that users
can have better understanding
2018-02-26 10:26:25 -08:00
c55d34ed81 Add operation time metrics to blobs_queue.
Export read time and write time from the blobs queue.
Fix queue balace stat for `blockingRead`.
2018-02-26 10:26:25 -08:00
a5b387fa27 Fix Caffe2 OSS build
Fix Caffe2 OSS build
2018-02-26 10:26:25 -08:00
c55a642d83 [c2] update SparseFeatureHash layer
The diff makes following changes for this layer: copy length blob; add nameScope for output schema; add layer tests
2018-02-26 10:26:25 -08:00
e397367db0 GatherRangesToDenseOp supporting sorting with keys
Added functionality to GatherRangesToDenseOp such that it supports an optional input KEY, and will sort DATA according to KEY for each example per feature.
2018-02-26 10:26:25 -08:00
bdd25d80a8 fix invalid-shift-base UB in conv_op_cache_cudnn.h 2018-02-26 10:26:25 -08:00
6922d7d89f Add cudaconvnet for caffe2
Add cudaconvnet for caffe2
2018-02-26 10:26:25 -08:00
c18f9b4dea Back out "[codemod] - comment out unused parameters"
Original commit changeset: 8e10b1f1e2ae

@allow-large-files
2018-02-26 10:26:25 -08:00
148f6b200a fix signed-integer-overflow UBSAN error 2018-02-26 10:26:25 -08:00
7e9f8af018 [codemod] - comment out unused parameters 2018-02-26 10:26:25 -08:00
fc9837899d Embedding.load_pretrained method (#5350) 2018-02-26 17:46:25 +01:00
f4cfd9bbfc Don't python bind 'tensor' or 'sparse_coo_tensor'. (#5390)
These are internal ATen functions; we have better python APIs.
2018-02-26 11:06:25 -05:00
10fd272b7a Update doc of batch size requirements for DP (#5108)
* Update doc of batch size requirements for DP 

Fix #5039

* Delete the recommendation for batch size

There's no significant speed difference between divisible and indivisible batch size.
2018-02-26 00:55:08 -05:00
7cafdab69b [C2] Implement Layer-wise Adaptive Rate Scaling (LARS) (#2034)
* [C2] Implement Layer-wise Adaptive Rate Scaling (LARS)

* [C2] Implement Layer-wise Adaptive Rate Scaling (LARS)

* add unit test for Lars

* set default value for lars to be None

* remove lars for subclasses of SgdOptimizer
2018-02-25 14:58:31 -08:00
39001db843 Update NNPACK and cpuinfo to check cpuinfo_initialize status (#2051) 2018-02-25 09:03:57 -08:00
1f9df59de9 Move caffe_option to proper cmake_dependent_option (#2049) 2018-02-24 23:31:36 -08:00
9d94a529fa Update NNPACK and its dependencies (#2047)
I made changes to NNPACK and its dependencies to not mess up global CMake state. This commit brings these changes to Caffe2.
2018-02-24 23:20:38 -08:00
07646e405e no_bias in resnet32x32 (#1817) 2018-02-24 16:58:23 -08:00
d2f71cbdeb make CuDNN finders respect library major version (#5399) 2018-02-24 19:37:00 -05:00
80430501c9 Remove the use of EXTERNAL_DEPENDENCIES (#2045)
* [cmake] Move nccl to modern cmake, and avoid using EXTERNAL_DEPENDENCIES

* [cmake] Move nnpack to modern cmake and avoid using EXTERNAL_DEPENDENCIES.

* [cmake] Move ATen to modern cmake and avoid using EXTERNAL_DEPENDENCIES.

* Move cpufeatures to modern cmake, and avoid using EXTERNAL_DEPENDENCIES

* Finally remove EXTERNAL_DEPENDENCIES.

* Maratyszcza's comments
2018-02-24 16:15:28 -08:00
40d79e4447 Turn on ASAN in continuous integration. (#5271)
I know this works because I had to squelch a bunch of ASAN
errors in multiprocessing.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-24 17:04:25 -05:00
1ff537ca71 Ignore FileNotFoundError when shutting down in data_queue.get (#5380)
* Ignore FileNotFoundError when shutting down in data_queue.get

* Address @apaszke comments
2018-02-24 13:32:13 -05:00
c60d509fdf Pin libnccl2 to version 2.1.2 (#2033)
* Pin libnccl2 to version 2.1.2

Version 2.1.4 exports C++ symbols that it shouldn't, which causes a
mismatch between raised exceptions and expected exceptions.

Pin this to 2.1.2 until this is solved and NVIDIA releases a new version.

* Fix for 9.1

* Actually pin 2.1.4 for 9.1
2018-02-24 09:51:47 -08:00
c06c6046e3 Accept GPU perf test regression. (#5395)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-24 18:24:24 +01:00
c1919b370b Skip Cast ONNX backend test, which is not supported in Float16 case (#2005) 2018-02-24 03:51:08 -08:00
a0118533ef Add a print() function to the JIT script (#5274)
Additionally:
- add support for calling functions that are not methods in the Python frontend
- add an end-to-end test for the Python frontend
- add a capture_stdout helper for checking that `print` actually works
2018-02-24 11:15:55 +01:00
fbf1f06521 Implement no-attribute dispatch of ATen ops from the JIT (#5298) 2018-02-24 11:15:43 +01:00
ff189b9023 Update CMake min requirement to 3, and use interface library for cuda libs. (#2021)
* Try use CMake interface library to simplify some of the cuda libs.

* Bump to cmake 3
2018-02-24 00:02:18 -08:00
07414d94d7 Add eigen version check (#2037) 2018-02-23 22:00:07 -08:00
0f0f7957e4 Add docker cmake3 install for ubuntu 14.04 (#2038) 2018-02-23 21:19:15 -08:00
ff3ef8301c [WIP] splitting conda-builds into separate build and test phases for PRs (#2031)
* [WIP] moving conda scripts to separate build+test

* [WIP] Splitting conda-builds into build and test phases

* Migrating build_local to call build_anaconda

* Tidying up a regex
2018-02-23 18:47:14 -08:00
c0866e45c7 Caffe2 ARM ComputeLibrary integration (#2015)
Caffe2 ARM Compute Library Integration
2018-02-23 18:09:05 -08:00
e3aae398ce [auto] Update onnx to e78c068 - Adding int32, int64 and double input data types for featurevectorizer (#547)
e78c068008
2018-02-24 00:33:05 +00:00
99e99130f5 Remove build_host_protoc.bat as it is no longer needed after protobuf update. (#2020) 2018-02-23 16:13:01 -08:00
30ec06c140 Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
7a36c132ce Skip denormal test for now. See issue #5331 (#5387)
* Skip denormal test for now. See issue #5331

* Skip denormal test for now. See issue #5331
2018-02-23 16:35:51 -05:00
da8e037e03 Test both CPU build and CUDA build for Windows (#5364) 2018-02-23 15:25:03 -05:00
6a2afe3b59 Fix segfault in test_dep_nograd (#5377) 2018-02-23 14:41:32 -05:00
b3fdfa7bd6 [DT] [4/n] Make epoch_group explicit for JobRunner (#2018) 2018-02-23 10:41:52 -08:00
dcbbf346c2 Change output_declarations in function_wrapper.py to be a NamedTuple (#5312)
* Add python typing module as build dependency

* Change output_declarations to be a NamedTuple

* Add mypy configuration files

mypy-files.txt includes a list of all files that should be typed checked
with mypy. Run mypy with `mypy @mypyfiles.txt`.

mypy.ini includes mypy options. Unfortunately this can't be merged with
mypy-files.txt.

Update .travis.yml so that one doesn't have to specify what files to
type check inside it.

* Add RuntimeError on missing `typing` module

Alerts users to the new build dependency.
2018-02-23 13:33:59 -05:00
2130070785 Handle copying empty sparse tensors to/from CPU, GPU. (#5361)
* Handle copying empty sparse tensors to/from CPU, GPU.

This is likely not a robust fix because it special cases the case where both the indices and values are empty
rather than handling each one separately.  But this is currently blocking a change introducing devices to constructors.

* Guard sizes being NULL.
2018-02-23 13:17:27 -05:00
232837a75e [auto] Update onnx to 3ca6622 - Fix pow op's test case (#546) (#548)
3ca6622ad0
2018-02-23 18:08:00 +00:00
6c587e9e67 Solves the linking error related to lazy_init for MSVC (#5368)
* Revert "Fix wrong argument name (#5366)"

This reverts commit cc9d3b265d7e688865fde055ee3a2f9b77b5714a.

* Solves the linking error related to lazy_init for MSVC

* Fix wrong argument naming

* Wrap torch::cuda::lazy_init with WITH_CUDA flag
2018-02-23 11:08:20 -05:00
77036704aa add a third output in LSTM onnx export (#5359)
since that output has been added to the ONNX spec
2018-02-23 10:58:45 -05:00
008ba18c5b Improve CUDA extension support (#5324)
* Also pass torch includes to nvcc build

* Export ATen/cuda headers with install

* Refactor flags common to C++ and CUDA

* Improve tests for C++/CUDA extensions

* Export .cuh files under THC

* Refactor and clean cpp_extension.py slightly

* Include ATen in cuda extension test

* Clarifying comment in cuda_extension.cu

* Replace cuda_extension.cu with cuda_extension_kernel.cu in setup.py

* Copy compile args in C++ extension and add second kernel

* Conditionally add -std=c++11 to cuda_flags

* Also export cuDNN headers

* Add comment about deepcopy
2018-02-23 10:15:30 -05:00
e2519e7dd1 Fix undefined refence to convolve_5x5_sse on SSE4.1 CPUs (#5371) 2018-02-23 10:12:48 -05:00
0f68eac94a Fixing an error building with CUDA on windows (#2004)
* Fixing an error building with CUDA on windows

* Fixing cublas issue too
2018-02-22 23:37:02 -08:00
cc9d3b265d Fix wrong argument name (#5366) 2018-02-23 00:37:02 -05:00
013ed5b88f Add lazy_init.h into build for Windows and refactor code (#5365)
* Add lazy_init.h into build for Windows and refactor code

* Remove minor bugs
2018-02-23 00:05:43 -05:00
cbd1fd6c85 Install onnx by using the onnx inside caffe2 (#2002)
* Install onnx by using the onnx inside caffe2

* Add (s)ccche symlink for x86_64-linux-gnu-gcc
2018-02-22 20:31:33 -08:00
c249f49ddd Rename caffe2_ref_test.py to c2_ref_test.py (#2016)
* Rename caffe2_ref_test.py to c2_ref_test.py

* Rename the module name doc too
2018-02-22 20:22:39 -08:00
8904616028 add control flow to interpreter (#5293)
* Use stacks in the interpreter/aten_dispatch

Rather than have separate input/output lists,
the interpreter now works using a single stack.
Operators in the interpreter push/pop from the stack.
This allows ownership of tensors to transfer directly to an operator,
and an operator can drop the reference to a tensors as soon as it is
no longer needed. This is important for the GraphExecutor op,
which recursively runs the interpreter.

Once autograd is updated to pass variables to Function by value,
we will be able to ensure that we release ownership as soon as possible.

This commit also switches the interpreter to use a fake
tensor 'ContainerTensor' rather than at::Retainable to hold non-tensor
data in the interpreter. This allows us to use std::vector<at::Tensor>
for all registers, which is significantly less confusing than the
OwnedRetainables struct it was replacing.

* Add If and Loop to interpreter

* Preprocess loop to calculate where references to tensor should be dropped
* Add control instructions JumpZ/JumpNZ/Jump
* Switch from explicitly having stage structs to having a single list
  of instructions with Store/Load instructions to take values off the
  initial stack
* Make the interpreter tests executable rather than use expect files
* add a flag to interpreter code so that constants are variables
  if the interpreter is running on variables.

* Add tensor_as to its own file
2018-02-22 19:56:15 -08:00
51897e52da fix all the broken tests from adding debug info (#2013) 2018-02-22 17:43:53 -08:00
38f18c1daa add third output in onnx -> caffe2 lstm conversion (#2011) 2018-02-22 17:43:33 -08:00
c2a3d85a07 Traverse sub-blocks in JIT passes (#5329)
* Traverse sub-blocks in JIT passes

* Add an extra check to prevent cross-block fusion
2018-02-22 17:32:31 -08:00
b6854ee012 support batch-first in ONNX export of padded sequences (#5360) 2018-02-22 20:24:56 -05:00
4e5df5cda6 added debug info to OperatorDef 2018-02-22 15:53:49 -08:00
02b758f63c Add a disabled-configs.txt interlock. (#5352)
This will make it easier to bring online new CI configurations
without temporarily breaking the CI, since you can mark it
as disabled in PyTorch HEAD first and then bring the job online.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-22 15:58:41 -05:00
fe5fe7bad2 CMake cuda targets (#1993)
* wip: cuda targets

* Remove FindCuDNN.cmake as it is no longer needed
2018-02-22 15:54:34 -05:00
4da3ce720f Support convolution without bias in NNPACK bindings 2018-02-22 14:25:17 -05:00
1848cad108 [ready] Layer Normalization (#4922)
* at::maybe_data_ptr and Check.h => TensorUtils.h

* THNN support for optional BN running_*

* ATen support for optional BN running_*

* Python nn.* support for optional BN running_*; Improve IN and BN doc

* Add tests for IN and BN new option

* Layer Norm

* Fix LRN doc

* functional interface for LN and IN

* Layer norm tests

* fix BN double backward returning undefined tensors

* fix jit test using wrong dim inputs for BN

* add/improve BN, IN and LN GPU tests with half type

* Udpate docs to be consistent with Conv notation
Fix onnx
Clarified onnx symbokic wrapper

* fix typo

* Address comments
2018-02-22 11:56:41 -05:00
2344decc91 Add onnx as a submodule (#1998) 2018-02-21 21:10:50 -08:00
9388d35293 prioritize cudnn library dir in library_dirs order (#5345) 2018-02-21 22:51:04 -05:00
090850e89b Adding guards around adding protobuf targets (#1997)
* Adding guards around adding protobuf targets

* Moving include_dirs add into target creation
2018-02-21 18:48:30 -08:00
3ee9b5edca [PR] Floor and Ceil Op
Closes https://github.com/caffe2/caffe2/pull/1932
GitHub Author: Mohammad Hossain <zem@devgpu242.prn2.facebook.com>
2018-02-21 18:31:45 -08:00
ccea6924a2 Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent). Second Try.
The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu.
2018-02-21 18:31:45 -08:00
853dba8e3b Improve sparse variable printing. (#5335) 2018-02-21 18:01:58 -05:00
579de82bcf DDP: 10% of NCCL backend perf improvements with mixed-prec support (#5064) 2018-02-21 23:59:52 +01:00
069f66e267 only delete S3 image for successful Windows tests (#5341) 2018-02-21 17:42:44 -05:00
0878c6d4d7 Various dtype improvements. (#5321)
* Various dtype improvements.

1) Add dtypes to the new data-based constructors: Variable.new_tensor and torch.autograd.variable.
2) In the python signatures, use Type instead of Dtype to match	the C++ signatures; the error messages still print as dtype.
3) Handle / add a better error message when a dtype is used when ATen was not compiled with that type (e.g. cuda types).
4) Move cuda_lazy_init to its own file.

A later commit will add support to the legacy constructors as well.

* Move implementation of lazy_init to cpp.

* Fix parsed_arg size.
2018-02-21 17:37:59 -05:00
702a7f3864 Improve Function interface (#5221)
* Improve Function interface

* Undo tracer changes

* Fix bug in VariableType.set_history

* Rename function_counter and sequence_number to sequence_nr

* Clarify Function documentation

* Replace swap_next_edges with next_edges() getter

* Bring back set_gradient_edge

* Simplify special.cpp

* add_gradient_edge -> create_gradient_edge

* Add mutable getters for pre/post hooks

* Use make_variable with Edge

* Remove remove_gradient_edge in favor of detach_

* Fix documentation and remove create_gradient_edge friend method

* Canonicalize some includes
2018-02-21 16:37:52 -05:00
ba8bbeced3 Fix input size checks in ATen for SpatialFractionalMaxPooling (#5337) 2018-02-21 16:37:11 -05:00
9bf9f0e613 Fix the bug of only processing one attribute (#5334) 2018-02-21 22:35:36 +01:00
642e4d0762 Fix typos (#5340) 2018-02-21 16:27:12 -05:00
09cff195df Improve GPU perf test (#5327)
* Reduce dataset size for word_language_model; increase NUM_RUNS for all GPU tests

* Test check_cpu_governor option

* Update perf test numbers for CPU and GPU
2018-02-21 13:21:18 -05:00
6522d6e692 Make _like dtype arguments keyword only. (#5320) 2018-02-21 12:18:39 -05:00
af4e72fdd2 Remove _out variants of like functions. (#5318)
These now have dtypes, which matches the numpy API.
2018-02-21 10:31:39 -05:00
0340e46f9b Disable tests that use DataLoader with multiple workers (#5322) 2018-02-21 09:20:37 -05:00
3ef2e484bf Add fp16 testcases in test_cuda (#5122) 2018-02-21 14:35:29 +01:00
4b8f4fc259 Added mixed-precision support in distributed training (#4891) 2018-02-21 14:29:39 +01:00
5e4acd032b Add an option in cmake for specifying caffe2 python lib relative installation path (#1981)
* Add an option in cmake for specifying caffe2 python lib relative installtion path

* Fix variable name
2018-02-20 21:49:05 -08:00
2588f5de06 Update onnx version to include the model files suffix change (#1991) 2018-02-21 00:39:50 -05:00
0d641145a1 Fix public protobuf interface (#1961)
* Fix public protobuf interface - wip

* Try turn on custom protobuf in mac jenkins.

* Adding back auto-fallback protobuf option

* Address typos pointed out by reviewers
2018-02-21 00:39:00 -05:00
5439ab3cdc Remove gf library in MKL (#1976)
* Remove OpenGL code from benchmark

* Make it possible to print plot in the ipython notbook

* Create the blob if the blob is not specified in the init net

* Do not use gf library for MKL. Even after I install the entire MKL library it is still not found. After removing it, the MKL code can still run
2018-02-20 15:17:34 -08:00
492466f25f extern C guards around some TH headers (#5316) 2018-02-20 17:54:15 -05:00
0074dc7fa8 Allow more backends in caffe2_benchmark (#1979)
* Remove OpenGL code from benchmark

* Make it possible to print plot in the ipython notbook

* Create the blob if the blob is not specified in the init net

* Do not use gf library for MKL. Even after I install the entire MKL library it is still not found. After removing it, the MKL code can still run

* Support more backends in Caffe2 Benchmark

* Revert "Do not use gf library for MKL. Even after I install the entire MKL library it is still not found. After removing it, the MKL code can still run"

This reverts commit 981b6693a94cbf63ad78d51bd806c7a0d7a5a2d3.

* Build caffe2_benchmark using shared or static library depending on the flag
2018-02-20 14:45:53 -08:00
5b142e5344 add guards when source of container cannot be retreived (#5317) 2018-02-20 17:42:57 -05:00
cc7e61c88d Move onnx-caffe2 inside caffe2 (#1921)
* Move onnx-caffe2 inside caffe2

* Update to the lastest onnx-caffe2 and update jenkins env

* Rename onnx_caffe2 to onnx

* Add __init__.py to caffe2/python/onnx

* Change CI check variable to JENKINS_URL

* Cherrypick recent onnx-caffe2 update
2018-02-20 13:56:52 -08:00
031412a14b setup.py and cmake improvements (#5269)
* Document env vars and properly propagate MAX_JOBS down.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Apply CFLAGS and LDFLAGS environment variables to cmake builds.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Test that running built program works; fixes #5151.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CMake CR.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-20 16:55:57 -05:00
7283d5194a Avoid having two protobuf on ubuntu14.04 (#1989)
* Avoid having two protobuf in ubuntu14.04

* Fix indent
2018-02-20 12:10:17 -08:00
5ce46be17c Disable test_multi_keep on Windows (#5314) 2018-02-20 15:00:53 -05:00
639d1c7c5e Make sure libcaffe2.so does not require executable stack
This commit updates python-peachpy submodule to bring in the fix.

In #1543 @samarjeet reported that importing caffe2 from Python fails on his system with the error "CRITICAL:root:Cannot load caffe2.python. Error: libcaffe2.so: cannot enable executable stack as shared object requires: Invalid argument". I investigated and found that this is caused by libcaffe2.so being marked as requiring executable stack, which itself was caused by assembly (PeachPy) files in NNPACK not specifying whether they need an executable stack (by default, linked assumes execstack needed). I patched PeachPy to add ".note.GNU-stack" section to generated ELF files, which makes the linker mark libcaffe2.so as NOT needing executable stack. See Maratyszcza/PeachPy#89 for details.
2018-02-20 13:28:24 -05:00
ee71eab4c6 Adding 'full' version of conda build (#1934)
Adds another package to Anaconda.org with a "-full" suffix which includes more libraries by default. This also installs NCCL 2.1 onto the CI Ubuntu docker images to accomplish this.
2018-02-20 10:20:07 -08:00
5edf6b2037 Add numpy-style dtypes to Variable factories. (#5245)
* Add numpy-style dtypes to Variable factories.

1) Add numpy-style dtypes corresponding to torch tensor types.  These are:
torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64
as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents.

2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names.  These are:
torch.half, torch.float, torch.double, torch.short, torch.int, torch.long.
torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures.

3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type.

4) Adds a "dtype" getter to Variables that return the canonical dtype from 1)

This PR is missing the following useful features that should be added in the future:
A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet.

B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function.

C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device.

* backend_to_string can be private.

* Define python binding argument indexes in a more simple way.

* add all_declared_types, still need to hook it up to THPDType.

* Fix all_declared_types for missing types (it's Sparse + Half).

* Ensure cuda dtypes are created even if compiled with NO_CUDA=1.

* Fix case where dtype is provided but dispatch is via namespace.

This happens in ones_like, empty_like, randn_like.

There is some question if we should do:
1) at::ones_like(tensor).toType(dtype)
2) at::ones_like(tensor.toType(dtype))

I did the former because this matches with the numpy documentation, i.e.:
"Overrides the data type of the result." and it's easier to implement.

Note that the above causes an extra copy, either of the input or output.
Here's a better implementation:
1) Make zeros_like, ones_like native functions that take an optional type (named dtype?).
2) Match the type argument with the dtype, so we don't have two different parameters.
3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes())

* Don't return from maybe_initialize_cuda.

* Don't leak DType name.

* Address cpp review comments.

* Share code between sparse and non-sparse test_dtypes.

* Rewrite _like functions as native function with explicit type parameter.

* Use type 'Type' instead of 'dtype' for consistency.

* Address review comments.

* Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.
2018-02-20 11:04:14 -05:00
d2ff733cb1 Make ReduceLROnPlateau serializable. (#5300)
* replace lambdas with partial

* flake8
2018-02-20 00:59:14 -05:00
596470011b minor sp, underlyhing->underlying (#5304) 2018-02-19 22:28:17 -05:00
0509f26d41 Speed-up nn.Linear for the 3d input case (#5279)
This adds at::_unsafe_view and uses it in matmul. The _unsafe_view
function is identical to view except that the output is not treated
like a view by the automatic differentiation code. This avoids in-place
modifications triggering the more expensive CopySlices/AsStridedBackward
behavior.

The _unsafe_view function is only safe to use on temporaries that will
be immediately discarded and that do not alias other tensors. Otherwise,
in-place modificatiions may trigger incorrect gradients. The funciton is
not exposed to Python.

See #5169
2018-02-19 19:47:20 -05:00
cf71385ec9 Implement torch.isnan (#5273)
* Implement torch.isnan

* Simple python implementation

* Fix typo
2018-02-19 19:46:35 -05:00
fae6c67121 Configurable flushing denormal numbers on CPU (#5294)
* Configurable flushing denormal numbers on CPU

* Formatting

* Update docs

* Minor doc changes
2018-02-19 19:23:43 -05:00
6279367297 Check class index in no-reduce ClassNLLLoss kernels (#5299) 2018-02-19 17:18:52 +01:00
5eefe87d4e Emit ternary if in script compiler (#5291) 2018-02-18 09:53:13 +00:00
9193dfd185 Disable test_multi_drop on Windows (#5290) 2018-02-17 20:49:12 -08:00
c71c84ee04 Tweak 'detach' docstring. (#5292) 2018-02-17 23:35:30 -05:00
f51e284408 Fix ASAN detected global buffer overflows in autograd (#5289)
* Fix asan buffer overflow in autograd saved_variable.cpp

* Fix asan global buffer overflow in any_variable_requires_grad

* Revert change in any_variable_requires_grad
2018-02-17 19:52:45 -08:00
9c207b195a Fixes UB when using legacy python functions and mark_non_differentiable (#5275)
* Fixes UB when using legacy python functions and mark_non_differentiable

If an output of a python Function is marked as non_differentiable,
autograd won't save a gradfn for that output. During the backward
pass, this translates to an undefined tensor being passed to the
backward of the Function. The legacy python Function path checks
if *any* of the inputs to backward requires_grad.

This requires_grad check uses Variable::get(), which casts the
undefined tensor to a VariableImpl and then accesses the _requires_grad
member. This is UB because the undefined tensor is NOT a VariableImpl.

The fix here is to add a check for if the variable/tensor is defined
in the legacy python Function code path.

* s/and/&&/
2018-02-17 19:06:40 -08:00
22fe542b8e Use TORCH_EXTENSION_NAME macro to avoid mismatched module/extension name (#5277)
* Warn users about mismatched module/extension name

* Define TORCH_EXTENSION_NAME macro
2018-02-16 22:31:04 -05:00
5c93ca258b check attribute existence in SpatialFullConvolution (#5255) 2018-02-16 21:06:08 -05:00
f4f5bad901 Adding a new ReduceScatter Operator.
Summary: Integrating Gloo's ReduceScatter operation with caffe2

Reviewed By: pietern

Differential Revision: D6970344

fbshipit-source-id: 27762f940812eb0bf6c99afb4ff1a25914855b11
2018-02-16 17:27:17 -08:00
36c49c9f4a change schema's __repr__() flat output to pprint style indented output
Summary: as title. This is similar with python pprint utility for nested json data structure. It can be useful for checking schema during debugging.

Reviewed By: kittipatv

Differential Revision: D6710767

fbshipit-source-id: e450aa5477fa1ad4f93c4573f8108a2f49956da8
2018-02-16 16:26:11 -08:00
3ffd6ffa7d while and if for experimental JIT script (#5176)
This commit adds while and if support to the experimental script frontend, following the design of ONNX.
2018-02-16 15:30:18 -08:00
fac4852ff4 - Fix unused parameter warning in pool_op.cc
Summary: We are going to enable `-Werror=unused-parameter` flag and I need to manually fix some files so we rest of this process can be automated with a tool called clang-tidy.

Reviewed By: yfeldblum

Differential Revision: D7012203

fbshipit-source-id: 585e9e89d916dca8894308438d0c985cb1e1b07a
2018-02-16 15:11:12 -08:00
1c2cef10e2 Make zstd position independent
Summary: Closes https://github.com/caffe2/caffe2/pull/1975

Differential Revision: D7012872

Pulled By: sf-wind

fbshipit-source-id: 31a8f787faf99894ac25508d85e1eb0f0e8e84a2
2018-02-16 14:35:37 -08:00
a4c7c88f13 Update onnx version for onnx-caffe2 test
Summary:
We recently added `onnx.optimize` from onnx-caffe2 to onnx. So we need a newer version of onnx to run the tests add in https://github.com/caffe2/caffe2/pull/1921.
Closes https://github.com/caffe2/caffe2/pull/1974

Differential Revision: D7012613

Pulled By: yinghai

fbshipit-source-id: db3476374a05ce0bc1341aab46bd27ea374fe014
2018-02-16 13:28:27 -08:00
c809d89810 Fix RowWiseSparseAdam implementation
Summary: The original implementation averaged the momentum across the embedding dimensions, which doesn't make any sense. This meant all the embedding dimensions received the same update, becoming a very memory-expensive one-dimensional embedding.

Differential Revision: D7003135

fbshipit-source-id: ed54e3427bc13895a4e949e96b4b17f6ebfb6d53
2018-02-16 13:28:26 -08:00
8bfb1aa71b Fix __syncthread in SpatialClassNLLCriterion.cu (#5276)
* remove unnecessary __syncthread in SpatialClassNLLCriterion.cu; fix reduceBlock comment

* address comments
2018-02-16 13:59:26 -05:00
a6a75621cb Fix warning in net_test
Summary:
Fixes an annoying warning when building for Android with tests enabled.
Closes https://github.com/caffe2/caffe2/pull/1970

Reviewed By: pietern

Differential Revision: D7011817

Pulled By: Maratyszcza

fbshipit-source-id: 06162d5c5b12ed939581ce9a8498fbed3eb2c47b
2018-02-16 10:56:15 -08:00
25e7f8ab28 Fix event synchronization logic
Summary: Fix logic in operator's event synchronization: Record might be called after async CPU op calls SetFinished

Reviewed By: azzolini

Differential Revision: D7003277

fbshipit-source-id: 4d77d6619c6403e71ba45fbaaf78e939982452b6
2018-02-16 10:18:45 -08:00
3975abe549 Make caffe2 handle out of bounds values correctly
Summary:
according to the new onnx standard in https://github.com/onnx/onnx/pull/513
Closes https://github.com/caffe2/caffe2/pull/1903

Reviewed By: dzhulgakov

Differential Revision: D6920004

Pulled By: smessmer

fbshipit-source-id: 95771f467499ae625ff0156418a4cdf5e5631a02
2018-02-16 09:09:47 -08:00
4157562c37 Added further automatic IBVERB lib and header check before enabling THD/Gloo IB support (#5264)
* Added further automatic IBVERB lib and header check before enabling THD/Gloo IB support

* Refectoring and addressed comments
2018-02-15 23:09:08 -08:00
60dc3ca66f Use 8-bit quantization only in cases when it makes sense.
Summary:
In some cases we were doing quantization even we we should not. This diff is
preventing this from happening.

Reviewed By: rayleichen

Differential Revision: D6953547

fbshipit-source-id: 7c65baaf969e5e1bddb68ca8182f4f3b43f2431d
2018-02-15 19:33:03 -08:00
c5497a34f6 Add CPU_ONLY tag for sparse_feature_hash layer
Summary: as desc.

Differential Revision: D6997841

fbshipit-source-id: 75a33ea146224979f149a36a063a78d6f18338ee
2018-02-15 19:05:56 -08:00
e411525f2c Add a FAQ, for now just 'out of memory' advice. (#5251)
* Add a FAQ, for now just 'out of memory' advice.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Updates based on comments.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* minor copyedit
2018-02-15 17:38:55 -08:00
801d7bc906 Update gloo
Summary:
This includes ReduceScatter implementation.
Closes https://github.com/caffe2/caffe2/pull/1969

Differential Revision: D7005286

Pulled By: manojkris

fbshipit-source-id: c68508b25dbc9ff700efa1426103f932806efc07
2018-02-15 17:14:18 -08:00
1711878aac Support EQ operator for bool type
Summary: EQ op should work on bool type.

Reviewed By: ender-wieczorek

Differential Revision: D6992905

fbshipit-source-id: 9a08c8b840963c9817405c7602a7f67dc6a6caab
2018-02-15 15:24:35 -08:00
70e71391d2 Fix THCTensor_(max) and THCTensor_(min) inits (#5265)
Their cuda kernels should be initialized with (min_value, 0) and
(max_value, 0), respectively, where the second number is a default index
value. However, they were being initialized with (max, 1) and (min, 1)
instead, probably a remnant from the lua torch days.

This caused bugs in torch.max() and torch.min() when the input is at the
extreme values, and the max value (or min value) occurs at index 0. For example,

  import torch
  x = torch.ByteTensor([[0]])
  x.cuda().max(dim=0)  # returns (0, 1) but the expected result is (0, 0)
2018-02-15 14:41:19 -08:00
cac3026b35 Fix typo in DataParallel docs (#5268) 2018-02-15 23:02:26 +01:00
cb2fd39fdd Add Python frontend to the JIT (#5190) 2018-02-15 22:53:19 +01:00
5ee4794d3c - Fix unused parameter warning in math_cpu.cc
Summary: We are going to enable `-Werror=unused-parameter` flag and I need to manually fix some files so we rest of this process can be automated with a tool called clang-tidy.

Reviewed By: yfeldblum

Differential Revision: D7001946

fbshipit-source-id: 680d812c98703ec57a9eb952a69c6316e7415be8
2018-02-15 13:18:53 -08:00
a27f0e4daa Fix conda removal step for Windows build (#5267) 2018-02-15 15:53:34 -05:00
fe72037c68 Add CUDA support for JIT-compiling C++ extensions (#5226) 2018-02-15 15:50:01 -05:00
170b22a8f0 Run tests with -v flag, fixes #5240 (#5259)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-15 15:44:25 -05:00
68aed0779d add reduce=True arg to MultiLabelSoftMarginLoss (#5097)
* add reduce=True arg to MultiLabelSoftMarginLoss

* Move some tests to new_criterion_tests

* fix flake8

* fix multilabelsoftmarginloss weights test
2018-02-15 15:29:44 -05:00
3384f56cce Fix setup.py
Summary:
There is a typo in the setup.py which will cause incomplete install. This fixes it.
Closes https://github.com/caffe2/caffe2/pull/1968

Reviewed By: bddppq

Differential Revision: D7000517

Pulled By: yinghai

fbshipit-source-id: c89e32bc5a4a77571f6ab6569297a6b6a1d1f2fc
2018-02-15 11:38:29 -08:00
3036346af6 Trying a quick patch to install protobuf 2.6
Summary: Closes https://github.com/caffe2/caffe2/pull/1964

Reviewed By: orionr

Differential Revision: D6994739

Pulled By: pjh5

fbshipit-source-id: dbef0c7e5b5ade1580effa463fe19f04b8a0a276
2018-02-15 10:59:30 -08:00
2f40c88508 downgrade docker back to 9 (#5257) 2018-02-15 12:31:45 -05:00
1fdb3929c9 Fixes for docstrings/sphinx rendering of CosineAnnealingLR and Local Response Normalization (#5254)
* Fix LaTex rendering in CosineAnnealingLR

Backslashes were interpreted by Python as escapes in the string, so \frac
turned into frac, which is not a valid LaTex command.
This could be fixed with double backslashes, but the easiest solution is to
just use a raw (r) docstring.

* Fix sphinx warnings for LRN doc headings

* Move LRN docstring from __init__ to class level

The docstring was not rendered by sphinx at
http://pytorch.org/docs/master/nn.html#torch.nn.LocalResponseNorm
because it was in the constructor.

* Remove superfluous backticks from LRN formula
2018-02-15 10:29:02 -05:00
16cd3f4a9e Don't allow to export models where parameters are inputs/outputs
Summary:
Without this enforce it's too easy to export model overriding it's params in
predictor.

Reviewed By: rayleichen

Differential Revision: D6984506

fbshipit-source-id: 9bbf375758686c6ad12ad071723f255363e98ae6
2018-02-14 23:54:42 -08:00
66131dec6f Expose Caffe2 WorkerPool from ThreadPool
Reviewed By: harouwu

Differential Revision: D6946610

fbshipit-source-id: a9fef0f1c7732b534433ee9517abddc32d0ec702
2018-02-14 21:09:15 -08:00
677030b1cb Revert "Remove unnecessary __syncthreads before reduceBlock" (#5250) 2018-02-14 22:24:40 -05:00
bd22b83d62 Fix nccl cmake files
Summary: Closes https://github.com/caffe2/caffe2/pull/1963

Differential Revision: D6994392

Pulled By: bddppq

fbshipit-source-id: 4ab6a8f7dcb4469bdd3e152559ff3474984776fc
2018-02-14 16:04:11 -08:00
8bbd376107 - Fix unused parameter warning in typeid.h
Summary: We are going to enable `-Werror=unused-parameter` flag and I need to manually fix some files so we rest of this process can be automated with a tool called clang-tidy.

Reviewed By: yfeldblum

Differential Revision: D6928263

fbshipit-source-id: 38ce3597b9968a2c0dba3ab21be5ee1c84a13e41
2018-02-14 15:48:22 -08:00
66a97ddfd6 Add CPU and GPU perf tests in enabled-configs.txt (#5243) 2018-02-14 15:35:51 -08:00
9dfbc120f5 Fix assertNotEqual handling of message/precision (#5246) 2018-02-14 18:08:53 -05:00
01b17b3e20 Disable support of using ninja in setup.py
Summary:
Our cmake files have some issue when using using ninja as the generator to build with cuda
Closes https://github.com/caffe2/caffe2/pull/1962

Differential Revision: D6992456

Pulled By: bddppq

fbshipit-source-id: 7aa328b16e7edfddfee33495352bfcf8cd8ce9f3
2018-02-14 14:19:04 -08:00
2078e4ed37 Check GCC version on Ubuntu (#5230)
* Check GCC version on Ubuntu

GCC 5 in Ubuntu 17.10 and newer doesn't define the macro _GLIBCXX_USE_C99
and causes std::to_string, std::isnan, std::isinf (and more) functions
not to be defined neither. This fix checks if GCC 5 is used on Ubuntu 17.10
or later and shows an error message describing the problem.

* Check GCC version on Ubuntu

GCC 5 in Ubuntu 17.10 and newer doesn't define the macro _GLIBCXX_USE_C99
and causes std::to_string, std::isnan, std::isinf (and more) functions
not to be defined neither. This fix checks if GCC 5 is used on Ubuntu 17.10
or later and shows an error message describing the problem.

Fixes #5229
2018-02-14 15:19:18 -05:00
cbb2ee66f9 Remove unnecessary _syncthreads before reduceBlock (#5242) 2018-02-14 15:18:54 -05:00
7363736c50 Fix THC multinomial stride usage; (#5238)
Improve multinomial test
2018-02-14 15:00:55 -05:00
284b3c3764 Fix Android build with binaries
Summary:
After we removed android-cmake submodule and switched to android.cmake.toolchain from Android NDK, the code that builds cpufeatures dependency is no longer valid. This commit fixes it.
Closes https://github.com/caffe2/caffe2/pull/1957

Differential Revision: D6990082

Pulled By: Maratyszcza

fbshipit-source-id: ccbe8190e30e097474a2876ed4c0b263bcb117ef
2018-02-14 11:10:52 -08:00
0a66c76a4c detailed error output for parameter sharing
Reviewed By: xianjiec

Differential Revision: D6986239

fbshipit-source-id: 5b8bb06ea2383ce64318b5322bda7a58469f3eb0
2018-02-14 11:10:51 -08:00
c784a273bc Fixing conda builds
Summary: Closes https://github.com/caffe2/caffe2/pull/1959

Reviewed By: orionr

Differential Revision: D6989330

Pulled By: pjh5

fbshipit-source-id: 721437031b3088409766c931753a774a921258be
2018-02-14 10:34:09 -08:00
52fa742c51 Revert D6893040: Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent).
Summary:
This reverts commit 30f614beea6f859fee25ce4f85573142885dde45

bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
cause_a_sev_many_files

Differential Revision:
D6893040

Original commit changeset: 30f614beea6f

fbshipit-source-id: 5e98a24699088283f864efe31234874bdacbe3c3
2018-02-14 10:34:08 -08:00
fe810edc80 Consolidated dockerfile changes, updated README (#5235) 2018-02-14 11:57:23 -05:00
8910dd5a81 Fix GraphExecutor and add more AD formulas (#5215) 2018-02-14 16:59:48 +01:00
318ae2085a Include __delitem__ for Sequential (#5233) 2018-02-14 13:04:27 +01:00
6204877cd4 Allow zero-dim tensors to be bound to at::Scalar (#5142)
* Allow zero-dim tensors to be bound to at::Scalar

This relaxes THPUtils_unpackLong and THPUtils_unpackDouble to allow
values convertable to PyLong and PyFloat objects. This includes NumPy
scalars and zero-dim tensors (Variables).

This is important to maintain backwards compatibility in the Tensor
constructors once scalars are enabled and Variable and Tensor are
merged.

* Add comment and unpack PyInt as int64_t
2018-02-13 23:14:40 -08:00
c746357017 Add dependency Python packages for onnx-caffe2
Summary:
onnx-caffe2 requires some more Python packages in order to run its tests.
Closes https://github.com/caffe2/caffe2/pull/1956

Reviewed By: bddppq

Differential Revision: D6985654

Pulled By: yinghai

fbshipit-source-id: 06d4ec95729b09cdd1bc7e096ecf6680124070cd
2018-02-13 22:17:39 -08:00
4256dbe2d0 Update perf test suite (#5191)
* hard exit when test output contains warning or error

* update perf test links

* update base machine description

* update z value range

* update cpu perf test numbers

* store perf test numbers in S3 instead, for easier updating

* update mini_sequence_labeler perf test link

* fix lint

* store perf test numbers in repo

* update link to mini_sequence_labeler test
2018-02-13 20:46:50 -08:00
198958bb52 Fix for PRId64 (#5228) 2018-02-13 22:42:42 -05:00
f7cc8e8822 Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent).
Summary: The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu.

Reviewed By: houseroad

Differential Revision: D6893040

fbshipit-source-id: 30f614beea6f859fee25ce4f85573142885dde45
2018-02-13 17:46:35 -08:00
fd28e0fa29 Add bool function to return whether a model contains loss
Summary:
Add a function to return true if the model contains loss and retuen
false if the model doesn't include a loss.

Reviewed By: kittipatv

Differential Revision: D6982444

fbshipit-source-id: 1f63b7a1eaa3077841a0ad5d8d854b471d0aa84c
2018-02-13 16:38:36 -08:00
a4d0a74cee Ensure Distribution.sample() result is detached (#5086) 2018-02-14 01:32:11 +01:00
1b71e78d13 CUDA support for C++ extensions with setuptools (#5207)
This PR adds support for convenient CUDA integration in our C++ extension mechanism. This mainly involved figuring out how to get setuptools to use nvcc for CUDA files and the regular C++ compiler for C++ files. I've added a mixed C++/CUDA test case which works great.

I've also added a CUDAExtension and CppExtension function that constructs a setuptools.Extension with "usually the right" arguments, which reduces the required boilerplate to write an extension even more. Especially for CUDA, where library_dir (CUDA_HOME/lib64) and libraries (cudart) have to be specified as well.

Next step is to enable this with our "JIT" mechanism.

NOTE: I've had to write a small find_cuda_home function to find the CUDA install directory. This logic is kind of a duplicate of tools/setup_helpers/cuda.py, but that's not available in the shipped PyTorch distribution. The function is also fairly short. Let me know if it's fine to duplicate this logic.

* CUDA support for C++ extensions with setuptools

* Remove printf in CUDA test kernel

* Remove -arch flag in test/cpp_extensions/setup.py

* Put wrap_compile into BuildExtension

* Add guesses for CUDA_HOME directory

* export PATH to CUDA location in test.sh

* On Python2, sys.platform has the linux version number
2018-02-13 15:02:50 -08:00
232ce18a41 Additional sparse Variable fixes (#5203)
* Fix mul with dense + sparse
 * Add missing hspmm and smm

Also make repeat only a function (not a method) to match Tensor
behavior.

These were discovered by running test_torch.py and test_sparse.py after
merging Variable and Tensor
2018-02-13 17:54:20 -05:00
83c494787d Allow adding to trainer_extra_schema
Summary: Sometimes we need to add some extra schema later

Reviewed By: sunnieshang

Differential Revision: D6951849

fbshipit-source-id: 564eb88f9250eae24869fd10ba3426e00a18af33
2018-02-13 14:40:36 -08:00
6f533fd8b8 Only overwrite path_prefix & path_type when not None
Summary: This breaks internal functionality

Reviewed By: aartibasant

Differential Revision: D6975222

fbshipit-source-id: ce751950b4b9217d8ea5de703690451e98642f00
2018-02-13 14:40:35 -08:00
232530cc28 Move scalar tests from common_nn to legacy_nn. (#5223) 2018-02-13 16:44:21 -05:00
9a726a0770 Skip system Python if Anaconda is used
Summary:
We don't care about a particular system Python when building Anaconda images.

Rebasing later to remove the sccache change once it is merged (#1952).
Closes https://github.com/caffe2/caffe2/pull/1953

Differential Revision: D6978409

Pulled By: pietern

fbshipit-source-id: 39762602cdd35eefd485a014011b53e3ee2e830d
2018-02-13 11:47:46 -08:00
4a377d7817 Optionally build with sccache
Summary:
Work in progress to start using sccache
Closes https://github.com/caffe2/caffe2/pull/1949

Differential Revision: D6978772

Pulled By: pietern

fbshipit-source-id: 721462d8e3470736472263337c628b287cd1a901
2018-02-13 11:35:26 -08:00
d99d28b3e6 Allow custom component tagging in DeviceOptions.node_name
Summary:
Modify detect_components to take a list of valid node_name prefixes instead of values.  Users can set node_name to e.g. `'sparse_component:0'`, `'sparse_component:1'`, etc.
and pass `'sparse_component:'` as a valid prefix.  Also add `Tags.SPARSE_COMPONENT` in addition to `Tags.SPARSE_SHARDED` and `Tags.SPARSE_DONT_SHARD` and update all calls to
`detect_device_components`.

Reviewed By: azzolini

Differential Revision: D6952599

fbshipit-source-id: e1b1e6b146a6bd053b295690016044fd5990c893
2018-02-13 11:14:41 -08:00
5fe2f3f9e5 Install sccache in base images
Summary: Closes https://github.com/caffe2/caffe2/pull/1952

Differential Revision: D6977809

Pulled By: pietern

fbshipit-source-id: 36fd3b42c4ad3b4a3415b7a270b052a973450209
2018-02-13 10:32:01 -08:00
f96f3c312d Implement symbolic for slice operation (#5204) 2018-02-13 10:12:59 -08:00
ab18aaeba7 Clarify output shapes of reduce=False losses (#5082) 2018-02-13 10:11:14 -08:00
da79697d45 make explicit about keyword-onlyness of out (#5165)
* make explicit about keyword-onlyness of `out`

fix issue 2 of https://github.com/pytorch/pytorch/issues/5156#issuecomment-364521510
2018-02-13 09:55:36 -08:00
7c3a8eaa15 Use sccache for CPU builds. (#5208)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-13 09:54:49 -08:00
e958727874 Disable NCCL tests for Windows (#5129) 2018-02-13 09:44:30 -08:00
8678e8584f Update aten docs (#5197) 2018-02-13 09:38:16 -08:00
da938019da Include newer Python 3 versions in base image builder
Summary:
cc bddppq
Closes https://github.com/caffe2/caffe2/pull/1947

Differential Revision: D6970881

Pulled By: pietern

fbshipit-source-id: 3a3c97d58e079ddf9afe9ea214efa7be60b4fbe4
2018-02-13 09:32:46 -08:00
147612e64a add reduce=True arg to SoftMarginLoss (#5071)
* add reduce=True arg to SoftMarginLoss

* add reference function for SoftMarginLoss

* Rebase onto master

* Address comments

* Fix flake8

* Fix rebase error
2018-02-13 10:51:57 -05:00
b11ba65204 Experimental support for setup.py develop mode install
Summary:
`python setup.py develop` / `pip install -e .`
Closes https://github.com/caffe2/caffe2/pull/1926

Reviewed By: orionr

Differential Revision: D6951780

Pulled By: bddppq

fbshipit-source-id: 01249cbca90ec5326ea4107d4e500ae95a9dbd7b
2018-02-12 23:36:18 -08:00
2b2d56d846 Add missing async deprecated wrapper to tools/autograd/templates/python_variable_methods.cpp (#5196)
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-12 23:29:35 -08:00
942f04ec16 Modest refactor of .jenkins scripts (#5202)
- Create a new common.sh to put common bash stanzas in
- Create a new enabled-configs.txt file, which you can use
  to selectively disable tests when running CI
- Specify exited user land via trap, which means early successful
  exit will correctly print the end sigil.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-12 23:26:39 -08:00
86803004e3 Fix cmake function to resolve libraries correctly
Summary:
Previous behavior may fail to resolve the correct library name. A rework of https://github.com/caffe2/caffe2/pull/1935 as it was messed up in the rebase...
Closes https://github.com/caffe2/caffe2/pull/1950

Reviewed By: bddppq

Differential Revision: D6974530

Pulled By: yinghai

fbshipit-source-id: 924b653e8ac0b68c46341edfd3eb05d9cc0155f2
2018-02-12 22:22:55 -08:00
2d5fbe6e0d Improve Variable interface (#5127)
* Improve Variable interface

* Address comments from @apaszke and @colesbury

* string ::operator= is not noexcept

* Remove ir.h from tracer_state.h to improve build times

* Make Variable a struct and pack SavedVariable fields

* Implement as_variable_ref

* grad_fn_ptr() -> grad_fn_unsafe()

* Reduce hackiness of set_type hack

* Include variable.h and edge.h in tracer_state.h because it uses them

* class Variable -> struct Variable because Windows cant even

* Make Variable::output_nr uint32_t instead of int

* Add comment about tracing state

* Replaced more static_cast<Variable&> and improve docs

* Remove SavedVariable destructor and construct members in init list

* Clarify docs for Variable

* Variable::set_version -> set_version_counter
2018-02-12 23:26:26 -05:00
d79a31761e rectangle_cropping_multi_cropping_color_jittering_lighting
Summary:
Change log
- Support rectangle cropping, where height and width of clip cropping can be set separately. This is useful when most video resolution is non-square, such as 240p, 360p and 480p where width is significantly larger than height.
  - Comparisons of training on ucf101 between using 112x112 croppings and using 112x144 cropping.
  - https://fburl.com/i0rw6y1k
- Support 14 multi-cropping per video clip at testing stage to improve classification accuracy. Take left-top, central-top, right-top, left-bottom, central-bottom, right-bottom and central-central croppings as well as their mirrorings. In total, 14 croppings.
   - Comparisons on the same model trained on UCF-101. Use 1 clip per video
      - RGB. f41014306, w/o Vs f41014868, w/ multi-cropping: `0.64099 Vs 0.65796`
      - OF. f41014889, w/o Vs f41014913, w/ multi-cropping: `0.65796 Vs 0.67624`

- Support color jittering and color lighting on RGB data for training data augmentation.
  - Comparisons of training on ucf101 from scratch with and without color jittering and lighting:
  - https://fburl.com/k69zatul

Reviewed By: HengCV

Differential Revision: D6962620

fbshipit-source-id: 9b43478945874142727fea351ee04417218e6606
2018-02-12 16:39:06 -08:00
0ef10385b2 Make Python functions respect grad mode (#5184) 2018-02-13 01:27:36 +01:00
38f2cd16ee If a blob is not specified in the init net, create the blob
Summary:
In Caffe2 Benchmark, if a blob is not specified in the init net, but only specified in the predict net (e.g. input), the blob cannot be retrieved from the workspace. In some cases, it results some errors.

Create the Blob before using it if it doesn't exist.
Closes https://github.com/caffe2/caffe2/pull/1948

Reviewed By: orionr

Differential Revision: D6970316

Pulled By: sf-wind

fbshipit-source-id: 3e317403de0b5cf7568c7bda69a0ebe9d59d4a1f
2018-02-12 16:24:18 -08:00
d116e47143 Fix compiler error. (#5179) 2018-02-12 19:14:51 -05:00
4b311847b1 Fix python extension suffix
Summary:
https://www.python.org/dev/peps/pep-3149

Add the missing abi tags to our pybind_state extensions.
Closes https://github.com/caffe2/caffe2/pull/1946

Reviewed By: orionr

Differential Revision: D6966545

Pulled By: bddppq

fbshipit-source-id: cb94bd7e635a6a21517a8df436f910f102686bf3
2018-02-12 14:57:48 -08:00
849f94526b Set MAX_JOBS=3 for OS X builds. (#5199)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-12 17:51:22 -05:00
df0a4474c4 Allow and warn when indexing a zero-dim Variable (#5114)
This better maintains backwards compatibility when Tensors and Variables
are merged. For example:

   >>> loss = var.sum().data[0]

Currently, `var.sum().data` is 1-dim so indexing. Once scalars are
enabled and Variable and Tensor are merged it will be zero-dim. This
change allows that expression to continue working (with a warning). In
the future, the canonical way to compute that expression will be:

   >>> loss = float(var.sum())

Or an equivalent alternative:

   >>> loss = var.sum().item()

Also fixes a few error cases.
2018-02-12 17:50:19 -05:00
bada92ddcd Implement Variable.new(...) overloads for sparse tensors (#5117)
We were missing support for the sparse variable constructors which take
indices and values.
2018-02-12 16:56:37 -05:00
c7d95dcba5 Don't use Variable vs. Tensor type-checks for requires_grad logic (#4919)
Prior to this change, test_autograd.py used type checks that
differentiate between Tensor and Variable to determine if an argument
needs requires_grad=True. This logic breaks when Tensor and Variable are
merged.

This changes the logic for method_tests so that:

 - non_differentiable(..) marks an argument as not requiring grad
 - floating point tensors have requires_grad=True
 - integral tensors have requires_grad=False
 - Variables are disallowed (unless they're wrapped in
   non_differentiable)
2018-02-12 16:20:14 -05:00
e39e86f119 Remove deprecated references to volatile (#5193) 2018-02-12 21:08:27 +01:00
f38b6f611e Replace NULL with nullptr in autograd (#5162) 2018-02-12 12:01:52 -08:00
2fd8e596b6 CUDA 9 (#5194) 2018-02-12 14:43:28 -05:00
4ed87e3c9e Make conda install and s3 cp in Windows build more quiet (#5187)
* make conda install more quiet

* make s3 cp more quiet
2018-02-12 14:22:21 -05:00
19c2ad8834 CUDA 9.0 and cuDNN 7 (#5186) 2018-02-12 14:21:56 -05:00
07be53b57f Move EmbeddingBag into ATen (#4856)
This diff creates code related to EmbeddingBag in ATen. It also allows sparse gradients.
2018-02-12 14:20:32 -05:00
177b4509ce Fix memory corruption in im2col/vol2col based convolution kernels. (#5173)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-12 14:13:42 -05:00
315ee107f6 document input_names, output_names feature of onnx export (#5189) 2018-02-12 19:56:02 +01:00
43f2877b7d Pinning networkx to 2.0
Summary:
Cause 2.1 moved bellman_ford, and scikit-image will install the most recent networkx by default
Closes https://github.com/caffe2/caffe2/pull/1944

Reviewed By: pietern

Differential Revision: D6966299

Pulled By: pjh5

fbshipit-source-id: 71ad387cb4a2b22cde3b87e6665977da6b4c428e
2018-02-12 10:55:11 -08:00
1c005602fc Adding model_id argument to nets in predictor_container when modelInfo exists
Summary: Copying model_id from metaNetDef_->modelInfo in PredictorContainer for dper models. Since these model_id's are strings of <model_id>_<snapshot_id>, changed them to strings in net_observer

Reviewed By: salexspb

Differential Revision: D6752448

fbshipit-source-id: 93c91950b44c012e57240aaf909bc961449cfd7c
2018-02-12 10:38:58 -08:00
2f1493f1fb #4990, Makes Window build fail quicker (#5175) 2018-02-12 13:11:15 -05:00
99474d28b8 Fix compound assignment in JIT script (#5178) 2018-02-12 09:12:28 -08:00
e1a88a7e98 Expose sparse variable sspaddmm (#5017)
* Expose sparse variable sspaddmm

* Delete unnecessary sspaddmm code for binding into THC

* Address comments

* Clean up code

* address comment
2018-02-12 11:18:44 -05:00
d7b6a61a54 DDP: coalescing many little broadcasts to improve performance (#4978) 2018-02-12 16:41:33 +01:00
b608ea9178 Fix sign error in TransformedDistribution.cdf() and .icdf() (#5172) 2018-02-12 11:45:22 +01:00
a061000250 Added check and test for betas parameter in Adam optimizer (#5147)
* Added check and test for betas parameter in Adam optimizer

* Simplified test
2018-02-11 20:24:43 -05:00
6dc41f9e63 fixed doc for cholesky potrs (#5180) 2018-02-11 20:23:10 -05:00
ba84e78144 Fix up caffe2 server build (for @mode/fbandroid/server)
Summary: This fixes issues revolving building on a devserver

Reviewed By: pjh5

Differential Revision: D6953242

fbshipit-source-id: 59b4d3f846971a8b5eb9c1d802a8bacef3fad696
2018-02-10 08:35:31 -08:00
78c9a35a84 GPU support for ChannelStatsOp
Summary: Step 1 of 3 in adding support for multidevice batch normalization on GPUs. Implements ChannelStatsOp for the GPU. Next steps are to port the backprop stats op and tie things together in DPM.

Reviewed By: rbgirshick

Differential Revision: D6953411

fbshipit-source-id: cd50e53d66ea84fe66021c08b978b28290d9f347
2018-02-09 19:31:31 -08:00
c718b7b62b Make shape inference work with MKLMemory
Summary: MKLMemory is not really a tensor, but we can make shape info collection work.

Reviewed By: stephenyan1231

Differential Revision: D6947770

fbshipit-source-id: 04303ea309a8a9c1ac4c5401c43934d1abb6a7c4
2018-02-09 19:03:10 -08:00
9f980b1795 Implement sparse tensor and variable norm(value) (#4882) 2018-02-09 18:45:32 -05:00
0df54f4d74 Fix typo
Summary: Closes https://github.com/caffe2/caffe2/pull/1919

Reviewed By: ppwwyyxx

Differential Revision: D6945318

Pulled By: orionr

fbshipit-source-id: 700585e56d627d17f8280fe40d81ae8d984a7f40
2018-02-09 15:31:14 -08:00
fedb7095d6 Make tree views statically typed in JIT script AST (#5145) 2018-02-09 22:18:31 +01:00
8243e898ab allow dropout in RNN ONNX export except in training mode (#5160) 2018-02-09 16:04:27 -05:00
4b8bf73729 Enable scalars. (#5158)
* Enable scalars.

* Avoid variable name shadowing in list comprehension, because it rebinds in python2, but not python3.
2018-02-09 15:45:41 -05:00
51267095d5 Remove enqueue_splits() from ReaderBuilder
Summary: The interface is not used anywhere AFAICT; cleaning up to make it less confusing.

Reviewed By: kuttas

Differential Revision: D6867040

fbshipit-source-id: 3e8a77df76ef09c6864c308561825777b326f76c
2018-02-09 12:20:53 -08:00
39c73556fb Update NNPACK submodule to fix build with Python >= 3.6
Summary:
enum34 dependency of PeachPy conflicts with built-in enum package on Python >= 3.6
This commit brings in NNPACK change to avoid using enum34 on Python >= 3.4
Closes https://github.com/caffe2/caffe2/pull/1925

Differential Revision: D6951906

Pulled By: Maratyszcza

fbshipit-source-id: a698d8bbbc7b7b0c1b0b532c2c9d74fe0d2ae266
2018-02-09 11:51:03 -08:00
06f8fc3f49 extend_operator_CostInferenceFunction
Summary:
- Extend SimpleNet::TEST_Benchmark to report extra FLOP, feature map memory, parameter memory at operator-level
- Add cost interfence function for 3D conv, sum, relu, spatial_bn, fc operators.

Reviewed By: sf-wind

Differential Revision: D6909893

fbshipit-source-id: 534492ccf2e15860e86f1e7f759ff338bf57753f
2018-02-09 10:56:29 -08:00
8f1f84a6f2 Expand distributions docs (#5148)
* Expand distributions docs

* Add ref to SCG paper

* Clarify use of distributions for SCGs
2018-02-09 12:14:17 -05:00
ce5702fa80 add reduce=True arg to HingeEmbeddingLoss (#5130)
* add reduce=True arg to HingeEmbeddingLoss

* pass arg to super constructor in HingeEmbeddingLoss

* make HingeEmbeddingLoss reference fn work on legacy
2018-02-09 11:38:36 -05:00
3b63e552f9 Fix test_distributions when WITH_SCALARS. (#5121)
* Fix test_distributions when WITH_SCALARS.

* Use SCALAR_SHAPE in test, use self.scale in AffineTransform.

* Handle device correctly for scalars.

* Fix one hot categorical.

* Fix relaxed categorical.

* Add a new_tensor instance method to Variable that takes only data.

This is to work around the legacy problems of new, where e.g.
new(5) will give you an unfilled tensor rather than a scalar.

* Fix cuda scalar code path.

* Remove double return.

* Work around lack of WITH_SCALARS.

* Use tensor_new.
2018-02-09 11:01:13 -05:00
6a9b7132ec Add a new_tensor instance method to Variable that takes only data. (#5144)
* Add a new_tensor instance method to Variable that takes only data.

This is to work around the legacy problems of new, where e.g.
new(5) will give you an unfilled tensor rather than a scalar.

* Remove double return.

* Fix cuda scalar code path.

* Work around lack of WITH_SCALARS.
2018-02-09 10:59:15 -05:00
6df58dac1d Make NNApi build
Summary:
To build with tests and benchmarks
`./scripts/build_android.sh -G Ninja -DBUILD_TEST=ON -DUSE_NNAPI=ON`
To run unit test
`adb push build_android/bin/nnapi_test data/local/tmp`
`adb shell "cd data/local/tmp &&./nnapi_test`
To run benchmark
`adb push build_android/bin/nnapi_benchmark data/local/tmp`
`adb shell "cd data/local/tmp &&./nnapi_benchmark`
Tested on Google PIxel 2 XL with android 8.1
Closes https://github.com/caffe2/caffe2/pull/1918

Reviewed By: Maratyszcza

Differential Revision: D6944604

Pulled By: hlu1

fbshipit-source-id: 462f010117ae4628b23bef506c41397de3817ad4
2018-02-08 19:02:18 -08:00
cec7003190 only enable FloatToHalf test for GPU
Reviewed By: bddppq

Differential Revision: D6945312

fbshipit-source-id: 9550a9607c0daec6783ce63d3c9f082ff27b0303
2018-02-08 17:48:47 -08:00
65fb885467 Bidirectional RNN export to ONNX (Elman/LSTM/GRU) (#5120) 2018-02-08 20:30:50 -05:00
de2a708187 Rename test.cc
Reviewed By: jerryzh168

Differential Revision: D6941693

fbshipit-source-id: ced6063b1776464953b445a0bc907d18baf4b172
2018-02-08 15:48:56 -08:00
08113f922b Vendor Python dependencies of NNPACK
Summary:
Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time
Closes https://github.com/caffe2/caffe2/pull/1917

Reviewed By: orionr

Differential Revision: D6938735

Pulled By: Maratyszcza

fbshipit-source-id: 841a6c47a1cd003a19f48f6c256aa4d9eb2cc6e4
2018-02-08 15:48:56 -08:00
27b9b7b15a Make TypeInference work for HalfToFloat & FloatToHalf.
Summary: add missing type mapping.

Reviewed By: kennyhorror

Differential Revision: D6940574

fbshipit-source-id: b70cea4ce2e519cb3e72d0482a38f50dbb968b4a
2018-02-08 15:33:43 -08:00
6ecaed5021 Generate a core dump when CompleteInTimeOrDie forcefully quits
Summary: CompleteInTimeOrDie was added to detect deadlocks and proactively exit. In addition, call os.abort() to generate a core dump so that the error is actionable.

Reviewed By: bmaurer

Differential Revision: D6938343

fbshipit-source-id: 8bd36da4f4bb1195bd3398f25d133a6ebf1c66ad
2018-02-08 14:08:51 -08:00
01de4e40d6 Fix a bug in nested parameter sharing logic.
Summary:
It appears that my initial implementation was not really working when one
starts doing nesting. This diff is fixing this by replacing itertools with
something that is really easy to reason about.

Reviewed By: idning

Differential Revision: D6933763

fbshipit-source-id: f7a1de996d878a41bac2b2acd9d87a7c4b416778
2018-02-08 13:32:53 -08:00
873f116380 adjust stft result comparison precision to 7e-6 (#5143) 2018-02-08 15:44:18 -05:00
6e0d0f08a9 Improves Conv*d(Transposed) docs to have correct newline and formatting (#5139)
Improves CUDA matmul error message by basically copying the CPU error message
2018-02-08 15:34:30 -05:00
6aaa701c9c Adding ThresholdedRelu Op support.
Summary: Core operator and python operator changes for adding ThresholdedRelu Op support.

Reviewed By: houseroad

Differential Revision: D6900660

fbshipit-source-id: 9b17ede13ccb3264286389c7fc633ab9c1a7bbbf
2018-02-08 12:18:40 -08:00
affe742d31 Add scalar module tests for test_nn. (#5116)
* Add scalar module tests for test_nn.

* Properly return from glu.

* Guard scalar test with skipIf.
2018-02-08 13:53:24 -05:00
0629785645 Initial type hints for function_wrapper (#4947)
* Initial type hints for function_wrapper

* Don't break python 2

* Update TopEnvironment

* Add mypy check to travis

* Add .mypy_cache to .gitignore
2018-02-08 13:52:31 -05:00
696db00bcd Print Parameters like Variables (i.e. print scalars correctly). (#5119) 2018-02-08 12:33:52 -05:00
8edde3de15 Ensure Tensors have storages in resizeNd (#5115)
Follow up to #4744

This is another code-path in which storages may be null, which is not
allowed in PyTorch. The Python tensor bindings handle this in pynew, but
the ATen bindings do not.

This is caught by test_torch.py when Tensor and Variable are merged.
2018-02-08 12:23:21 -05:00
a9f3299abe Fix test_distributions to always use Variables for examples. (#5134) 2018-02-08 12:15:21 -05:00
8e9b530fd7 Fix ffi cdata for Variables. (#5128)
* Fix ffi cdata for Variables.

* Fix parameter order.
2018-02-08 10:54:41 -05:00
c4d43b4c7c Implemented RelaxedOneHotCategorical + RelaxedBernoulli distributions (#5056) 2018-02-08 14:13:38 +01:00
3108ce63ba Back out "[caffe2][PR] Vendor Python dependencies of NNPACK"
Summary:
Original commit changeset: d0c1c7681605

Reverting due to broken OSS build due to this commit

Reviewed By: bddppq

Differential Revision: D6935666

fbshipit-source-id: 955cfeb6d5a4ed265b2e099094cfb5bfe960ff95
2018-02-08 01:34:22 -08:00
5816721e35 Fix the evaluation order problem with build_lstm_body (#5124)
C++ argument evaluation order is undefined and leads to different
results in different platforms. This commit fixes build_lstm_body to
do the calculation slightly differently.

Fixes #5055
2018-02-08 00:49:16 -05:00
beb9fe6a46 remove some warning introduced by #2764 (#5104) 2018-02-08 00:09:30 -05:00
2d84cb4b04 warn that CUDA capability 3.0 and 5.0 is no longer supported (#5125) 2018-02-08 00:07:53 -05:00
9093eb1ba0 Vendor Python dependencies of NNPACK
Summary:
Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time
Closes https://github.com/caffe2/caffe2/pull/1901

Differential Revision: D6930731

Pulled By: Maratyszcza

fbshipit-source-id: d0c1c7681605d957de6f51bd24fbb25afc0f282f
2018-02-07 17:48:06 -08:00
e0e124e617 Fix RNN scoping situation
Summary:
There is a long lasting problem of scoping which was introduced in original python wrappers early in H1. Basically each RNNCell implemented has to manually scope outputs of each of the operators. If somebody forgets, then there could be weird bugs with layers etc.

Approach is the following. User has to explicitly specify current scope when using  apply_over_sequence function and others if the function is going to be called several times (like for stacking layers). This way we use Caffe2 native scoping approach instead of inventing one extra API people have to use (i.e. passing scope name as an argument to the RNNCell constructor).
Closes https://github.com/caffe2/caffe2/pull/1681

Differential Revision: D6777536

Pulled By: salexspb

fbshipit-source-id: 73d860b8d4857589e04bdea5a6fcd3080d68427c
2018-02-07 17:35:29 -08:00
8724298482 Fixing spdir copy in build script for cuda
Summary: Closes https://github.com/caffe2/caffe2/pull/1913

Reviewed By: orionr

Differential Revision: D6932577

Pulled By: pjh5

fbshipit-source-id: 49921ed345d922b9584c51c150405ba4f37e780d
2018-02-07 17:35:29 -08:00
99cdf7f91c Integrate android nn api
Summary: Integrate android nn api into Caffe2. Supported ops include averagepool, maxpool, conv, relu, and softmax

Reviewed By: Maratyszcza

Differential Revision: D6560366

fbshipit-source-id: 2879a99c01acb050e711d9d7d5bde022ef95888d
2018-02-07 16:53:58 -08:00
2c27bae802 Change Windows CI conda install path (#5126) 2018-02-07 17:56:25 -05:00
ef14590209 Support calling pack_padedd_sequence with a Variable lengths (#5113)
This was accidentally lost while addressing review comments on
https://github.com/pytorch/pytorch/pull/4695

pack_padded_sequence may be called either with a list or with a
Variable. If called with a list we convert to Variable internally.

I added to test_nn to test the new codepath. The bug was also caught
by the onnx-fb-universe tests (which rely on passing in Variable).
2018-02-07 17:11:33 -05:00
bf603299b6 Restore torch.mm behavior for sparse variables (#5077)
torch.mm(sparse, dense) -> dense works for tensors. This PR makes it work for variables as well.

I renamed mm to _mm in Declarations.cwrap and wrote a native mm function that wraps _mm for the dense case and addmm for the sparse case.
2018-02-07 15:42:29 -05:00
85e22b5475 Reverts force_gpu_half changes from #3660 (#5000)
The test_cuda.py setup purports to test half tensors, but actually just
re-tests FloatTensors because the keys in type_map were str instead of
type. Testing HalfTensors is more complicated, requiring changes to
precision and requires excluding some unimplemented methods.

We should fully test half CUDA tensors. This change just deletes the
duplicate tests of FloatTensor.
2018-02-07 15:33:17 -05:00
3e85613751 Experimental jit script (#5074) 2018-02-07 20:43:45 +01:00
1de4501078 Add scalar module tests for common_nn. (#5095)
* Add scalar module tests for common_nn.

* Properly skip cuda Hardshrink tests.

* Fix flake8.
2018-02-07 14:09:24 -05:00
25e946bf78 Replace edge_type with Edge and create Variable::gradient_edge() (#5030) 2018-02-07 10:50:42 -08:00
0390587d12 Bind Tensor.random_ in ATen for CUDA (#5111)
Matches the behavior of TensorRandom.cwrap
2018-02-07 13:45:24 -05:00
c111cdfd1d Add onnx support for InstanceNorm (#4626)
* Add ONNX symbolic for instancenorm

* Fix some bugs
2018-02-07 10:54:30 -05:00
011941087a Implementation of the cumulative distribution function and its inverse (#5079) 2018-02-07 16:10:19 +01:00
e75b434ca2 fix MultiLabelMarginLoss test names (#5098) 2018-02-07 11:28:36 +01:00
e027277a57 Set of RL improvements: Fix error in quantile computation. Handle missing values in sparse_to_dense. Replace page_size with minibatch size.
Summary: Set of RL improvements: Fix error in quantile computation.  Handle missing values in sparse_to_dense.  Replace page_size with minibatch size.

Differential Revision: D6888977

fbshipit-source-id: bb84477866c64da5ff57d6c25df1c8d3b799e437
2018-02-06 20:48:00 -08:00
8f78dd7249 Refactor CPU and GPU perf tests (#5078) 2018-02-06 22:12:53 -05:00
ccb61e0da7 Check shape instead of number of elements for some losses (#5085)
* Check shape instead of number of elements

* Remove arbitrary-shape spec in MSELoss docs
2018-02-06 22:12:11 -05:00
47ee86776e Fix CPU torch.multinomial with noncontiguous prob tensor (#5093)
* fix CPU torch.multinomial not working on noncontiguous probability distn'

* address comments

* change some tabs to spaces in THStorage.c
2018-02-06 22:11:43 -05:00
b2cfd961d3 Handle sequence lengths correctly when exporting RNNs to ONNX (#4695)
* PackedSequence: store batch_sizes as tensor

rather than converting to a list of python integers. This maintains
the invariant that module's inputs/outputs are collections of
Variables.

In particular, this causes the JIT to no longer choke when flattening
and unflattening arguments.

* Handle sequence lengths correctly when exporting RNNs to ONNX

- when uniform sequence lengths are provided, correctly omit the
  argument when constructing the ONNX graph, so as to not fix the
  graph to the batch size.

- handle PackedSequences by floating them through the graph and
  eliminating them in an optimization pass. ONNX does not have packed
  sequences, but operates on a representation equivalent to
  PaddedSequence, so we hide the representation-switching from ONNX

- as a preliminary step towards handling PackedSequences, not directly
  tied to ONNX export, change batch_sizes from being an argument to
  the RNN operators into being an argument to the forward() function
  of those RNN operators. This more closely models the reality that
  batch_sizes are effectively part of the input sequences.
2018-02-06 21:40:27 -05:00
7dafb1217e Fixed CAFFE2_API decoration for caffe2/proto when building static libraries
Summary:
This was forgotten in #1854.

cc Yangqing
Closes https://github.com/caffe2/caffe2/pull/1880

Differential Revision: D6919916

Pulled By: Yangqing

fbshipit-source-id: 1a8dbae604677bc3c3d23b4e55bd09bb87c24cfd
2018-02-06 18:11:53 -08:00
f796080781 Add assignment support for Sequential (#4931) 2018-02-07 02:22:25 +01:00
f160e552df change long to int64_t (#5094) 2018-02-06 19:35:02 -05:00
b3c8b3d132 Adding more summary output to make debugging CUDA problems easier
Summary: Closes https://github.com/caffe2/caffe2/pull/1902

Reviewed By: orionr

Differential Revision: D6917525

Pulled By: pjh5

fbshipit-source-id: af8c3d1adcd528a49bcd2885207304e199a06f6f
2018-02-06 16:04:50 -08:00
7af433deeb Add scalar criterion tests (#5087)
* Add criterion scalar tests.

This exposed an issue in MarginRankingLoss with scalars, but the cleanest way to fix is to wait
until forward runs on Variables (so we don't have to wait for the backward to check if something
is a scalar).

* Fix flake8.

* Add error message for margin_ranking_loss with scalars.
2018-02-06 18:40:37 -05:00
3cd825d25e Check that indices and values are on the same device (#5089)
We perform this check in the generic/SparseTensor.cpp (the Python binding),
but the ATen bindings don't use that code path

Fixes test_broadcast_coalesced with sparse tensors
2018-02-06 18:23:26 -05:00
a68e224219 Fix ONNX While test for CUDA
Summary: We should not be trying to instantiate this op on GPU at this point

Reviewed By: pietern

Differential Revision: D6915576

fbshipit-source-id: 6bdbc93ad12fc67e3001fce1b506fe2895d7b0ba
2018-02-06 14:35:34 -08:00
895aebac08 Use Variable instead of Tensor in Function.forward (#4786)
The Tensor and Variable classes are being merged.
autograd.Function.forward is now called on Variables, but with "no-grad"
mode (torch.no_grad()) enabled.

One benefit is that we no longer have to explicitly track shared
storages.
2018-02-06 17:24:27 -05:00
c4d3f69053 Add Variable.item() (#5090)
Variable.item() converts one-element tensors to standard Python numbers.
This operates like float(var) or int(var) depending on
the data type of the Variable.
2018-02-06 17:15:53 -05:00
c1b98f0841 Add deprecated add_out overload (#5088)
We have a few calls that use this signature on Tensors. This also
updates the binding code to support deprecated xxx_out signatures.
2018-02-06 17:08:23 -05:00
e2f193c650 Installing recent setuptools version for python3
Summary: Closes https://github.com/caffe2/caffe2/pull/1900

Reviewed By: pietern

Differential Revision: D6915920

Pulled By: pjh5

fbshipit-source-id: 9bd8073a9670afd6e9fc02228cff7d8800762d5c
2018-02-06 14:05:10 -08:00
36bbaf0d85 Fixed double memory accesses of several pointwise operations. (#5068)
Because nvcc does not know that in/out pointers do not alias each other,
if we assign a value to *out and then use *in again, the kernel has to
emit a write to *out and then another read from *in.

(Affected kernels become marginally faster after the fix.)
2018-02-06 16:24:03 -05:00
4ad7fab16e Fix TH compile warnings (#5065)
* fix all TH compile warnings

* wrap __attribute__((unused)) in a macro
2018-02-06 14:47:51 -05:00
fcccd07cc0 Implement hinge_embedding_loss as a native function. (#5080) 2018-02-06 14:43:36 -05:00
78649419c4 Cuda 9.1 is cuda version 9010 not 9100 (#4861) 2018-02-06 13:40:25 -05:00
67ff50c30d Run test_nn criterion tests over Variables, add a scalar test (#5058)
* test_nn working.

* Fix some incorrect scalar assumptions.

* Don't use Variables when we don't have to.

* Use Variable Mixin.

* Fix NLLLoss reference function when WITH_SCALARS not enabled.

* Allow device to be optional in cuda().

* Fix multilabelmarginloss_reference.
2018-02-06 11:11:18 -05:00
13ef8432b6 parallelize vol2col and col2vol of Conv3D with CPU backend (#4824)
*     parallelize vol2col and col2vol of Conv3D with CPU backend

*     parallelize vol2col and col2vol of Conv3D with CPU backend

* interface test of conv3d

* replace long with int64_t

* correct pragmatic error of comments
2018-02-06 10:53:53 -05:00
237c27c35f Fix reduction functions not respecting the strides of output when output is correct size (#4995) 2018-02-06 10:50:28 -05:00
c028bcd466 Fix input of Reduce{Front/Back}{Sum/Mean}Gradient ops
Summary: The previous refactor of these four Ops changed their input semantics, which makes backward impatible with old models. This diff fix this problem by checking the input and define follow-up behavior by case, so that the old models can be accommodated.

Reviewed By: dzhulgakov

Differential Revision: D6905840

fbshipit-source-id: fc37baec407fd5eae64fc9c2b61aba3c492a90f3
2018-02-05 23:33:07 -08:00
f383600625 ONNX While Operator
Summary:
Special While loop operator that follows the semantics of While in ONNX: https://github.com/jamesr66a/onnx/blob/controlflow/docs/Operators.md#experimental-loop

Stuff that's missing:

- Lexical scoping enforced via child workspaces
- Double-buffering on forward

Further possible enhancements:
- Full parallelism when there are no loop-carried dependencies
- Diagonal execution
- More optimized scan_outputs shaping via static shape inference provided in ONNX (coming sometime)
- GPU support (probably just some tensor value management stuff)
- Gradient support (likely low-pri right now)
Closes https://github.com/caffe2/caffe2/pull/1848

Reviewed By: dzhulgakov

Differential Revision: D6907524

Pulled By: jamesr66a

fbshipit-source-id: 4938108733e168b8c027035091104712a18c992a
2018-02-05 21:05:52 -08:00
6a02cb2844 implement sequence length support for BasicRNN
Summary: Closes https://github.com/caffe2/caffe2/pull/1843

Differential Revision: D6839575

Pulled By: anderspapitto

fbshipit-source-id: efdf00f1c5cfb0d63f1992028a796c8277b76688
2018-02-05 21:05:51 -08:00
805639906a Broacast output requires_grad if only corresponding input requires_grad (#5061) 2018-02-05 23:38:35 -05:00
895987f9e9 Add clang-format style file to caffe2
Summary: Closes https://github.com/caffe2/caffe2/pull/1894

Reviewed By: dzhulgakov

Differential Revision: D6908057

Pulled By: jamesr66a

fbshipit-source-id: 1f5657e7051e2ce77a30d37c1f1c40345651d0fe
2018-02-05 20:35:18 -08:00
c9ee47b0b5 Fix topk work size computation (#5053)
* fix grid computation for topk kernel

* backslash alignment, no change in code
2018-02-05 23:34:31 -05:00
fc856f0036 Fix type and disable warning about cpu arch
Reviewed By: Yangqing

Differential Revision: D6908986

fbshipit-source-id: 42bba410ea772999717353b749c411bd8484af6b
2018-02-05 19:36:01 -08:00
a83c240644 Fix maxpool3d / avgpool3d crashs (#5052)
* Replace downcastOuter with newFoldBatchDim

* Fix double free

* Address comments
2018-02-05 21:14:25 -05:00
28f42cc8e7 separating set_params and init() for checkpoint managers.
Summary: separating set_params and init() for checkpoint managers.

Reviewed By: anshulverma

Differential Revision: D6852255

fbshipit-source-id: 061f16ce0c49953ca8a5fe9546af5c9945a3be48
2018-02-05 18:03:21 -08:00
1d044dc459 Changing sed call in CUDA conda-builds to keep friendly package name
Summary: Closes https://github.com/caffe2/caffe2/pull/1893

Reviewed By: orionr

Differential Revision: D6903915

Pulled By: pjh5

fbshipit-source-id: 4cdd98f7cc0be68f6aa9a455c4d4d8478c4e8869
2018-02-05 17:36:37 -08:00
61ad0e486b cmake: python packages now install to the cannonical directory
Summary:
Addresses issue #1676

Now when `make install` is run, the `caffe2` (and `caffe`) python modules will be installed into the correct site-packages directory (relative to the prefix) instead of directly in the prefix.
Closes https://github.com/caffe2/caffe2/pull/1677

Reviewed By: pietern

Differential Revision: D6710247

Pulled By: bddppq

fbshipit-source-id: b49167d48fd94d87f7b7c1ebf0f187ec6a203470
2018-02-05 17:05:34 -08:00
7c7e09fe2d Adding the Percentile op & UT
Reviewed By: MisterTea

Differential Revision: D6879507

fbshipit-source-id: 7ca4165a42c073e384d3a6138ef033ca384afd49
2018-02-05 16:08:00 -08:00
239d3b2461 Add formulas for LSTM ops to JIT AD (#4916) 2018-02-06 00:01:02 +01:00
3f0a99dc90 Update FXdiv submodule
Summary:
This brings an option to disable inline assembly in FXdiv via CMake configuration option `-DFXDIV_USE_INLINE_ASSEMBLY=OFF`
Inline assembly in FXdiv apparently triggers a bug in some gcc versions
Closes https://github.com/caffe2/caffe2/pull/1892

Differential Revision: D6904507

Pulled By: Maratyszcza

fbshipit-source-id: 2ef24b277cbaa2634c69e2d53cef21415b05195f
2018-02-05 14:33:20 -08:00
885c874167 Fix refcycles in DataParallel scatter and gather (#4988)
* Eliminate reference cycles in scatter_gather

* Test for refcycles

* Better fix

* Add comments
2018-02-05 17:19:36 -05:00
cfb536937c fix Android typeid_test.cc build error
Summary:
Fix typeid_test when running android C2 tests

Previously it says:
    Build failed: Command failed with exit code 1.
    stderr: caffe2/caffe2/core/typeid_test.cc: In member function 'virtual void caffe2::{anonymous}::TypeMetaTest_Names_Test::TestBody()':
    caffe2/caffe2/core/typeid_test.cc:49:12: error: variable 'string_meta' set but not used [-Werror=unused-but-set-variable]
       TypeMeta string_meta = TypeMeta::Make<string>();

Reviewed By: Yangqing

Differential Revision: D6869192

fbshipit-source-id: ccbc30d53d04a8ece98de0a99598c176e6aaf4dc
2018-02-05 13:51:31 -08:00
b08101e281 Bring back Tensor::data<__half>() and remove base Tensor::data() template (#5035) 2018-02-05 16:42:58 -05:00
d8748a9d53 GRU sequence lengths: allow unspecified sequence lengths
Summary:
modeled after the earlier change for LSTM
Closes https://github.com/caffe2/caffe2/pull/1841

Differential Revision: D6837461

Pulled By: anderspapitto

fbshipit-source-id: de4e787019fa30f813a4b29f14b7000ce9d22d8e
2018-02-05 13:20:05 -08:00
019c1c4ca5 Removing some default dependencies of CUDA conda builds
Summary: Closes https://github.com/caffe2/caffe2/pull/1847

Reviewed By: orionr

Differential Revision: D6900615

Pulled By: pjh5

fbshipit-source-id: 5c9fec941b13bcb1007e0a29801e8e70ec042840
2018-02-05 11:20:11 -08:00
e4eaf67ec9 Fix torch.diag backward with non-square matrix (#4538)
* Fix torch.diag backward with non-square matrix

* Addressed comments
2018-02-05 13:59:29 -05:00
91efc30bfa fix #5047 (#5048) 2018-02-05 13:58:29 -05:00
1eaa10b32e Update torch.distributions documentation (#5050)
* Add a small paragraph for pathwise estimator

* Add differentiability as well

* Add small snippet and clear some grammatical errors

* Update documentation to reflect has_rsample

* Add a fix for ExponentialFamily docs

* Update __init__.py
2018-02-05 13:57:38 -05:00
7bd2db997e Port cuDNN RNN bindings to ATen (#4881)
* Add transpose() to TensorGeometry.

This code is dead; I briefly used it in my RNN patchset but
eventually rewrote it to not be necessary.  However, it seemed
like a useful gadget so I kept it.  In general, it seems that it
would be useful for TensorGeometry to support all operations that
Tensor does, but it only computes the changes to sizes/strides
instead of actually doing the computation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Turn on wrap_dim behavior for TensorGeometry

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support for hard-coded differentiable outputs.

Some outputs of functions are nondifferentiable, and should always
be returned with requires_grad=False.  Traditionally, we have used
the presence of 'grad' to signal that only the first output is
differentiable, and the rest are not, but cudnn_rnn (to be
implemented) breaks this pattern; its first three outputs are differentiable,
but its last output is a buffer that is just consumed by backwards.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* TensorGeometry constructor from just sizes

The sizes are assumed to form a contiguous tensor, and we compute
the strides we would get in that case.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support saving TensorList for backwards.

There is some back story here.  Saved TensorList in backwards will
be used by cudnn_rnn, and it is worth asking, why is it necessary to
save a list of tensors?  Indeed, *technically* speaking a list of
tensors is not necessary, we only need to save the sizes of each
of the weight tensors.  (We need the sizes because cuDNN is only
going to blast the derivative of weights into a flat buffer, but
we need to match the sizes of the views into the buffer when we
eventually return the derivatives.)

However, it was surprisingly awful trying to implement passing just
sizes, because as non-Tensor arguments, the JIT interpreter generation
code is expected to handle all non-Tensor arguments as attributes in the
trace, and our attributes struct doesn't actually know how to do
arrays of arrays.  Saved TensorList code was much easier to get working,
so that's what this patch does.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef.

Like ArrayRef, this class does not own the underlying data, it is expected
to be used in situations where the data resides in some other buffer.
This is intended to be trivially copyable, so it should be passed by
value.

For now, 2D only (so the copies are actually cheap, without having
to write a SmallVector class) and contiguous only (so we can
return non-strided ArrayRef on index).

The intended use-case (not in this commit) is to make it easier to
work with RNN weights, which are num_weights x num_layers matrix of
parameters.

P.S. dimension 0 indexes rows, dimension 1 indexes columns

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Generalize getDataType in Descriptors.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Change copy_range to take Tensor, and change cat_tensors_backward accordingly

Should a backward function return a Variable or a Tensor?  For the most
part, all of our backward functions return Tensor, except cat_tensors_backward,
which returns a variable_list (which is really the only thing that matters,
because Tensor and Variable are interconvertible).  But this is kind of weird,
because it means that you can't implement a backwards in ATen that returns
a std::vector<Tensor>, and then hook it up transparently with the derivatives
code.  So I switched it over.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support 5-ary return Tensor tuple.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support code generation with mixed Tensor/TensorList in output.

I don't think I ended up using this in cudnn_rnn, but this seems
it might be useful for someone else later.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support 4-ary boolean array

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add support for retain_variables in tools/autograd/derivatives.yaml

'retain_variables', a bool which is true if a user has specified
that saved variables should be retained in case the backwards is
run again later.  This allows an optimization where we can
destroy saved buffers if we know variables are not going to be retained,
e.g., it is (will be) used by _cudnn_rnn

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Lazily initialize cuDNN descriptors

Previously, cuDNN descriptors were eagerly allocated as soon
as a FooDescriptor object was created.  However, in some uses
of TensorDescriptor, this is problematic: some tensors are optional
and cuDNN's API expects to be given a nullptr TensorDescriptor
in this case, not an uninitialized (but allocated) descriptor.

Lazily initializing the descriptors makes it less likely for
us to use uninitialized memory and matches the usual semantics of
unique_ptr.  It's good sense!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Port cuDNN RNNs to ATen.

This brings three new functions:
  - _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into
    a single contiguous weight buffer as required by cuDNN
  - _cudnn_rnn: run RNN forwards
  - _cudnn_rnn_backward: run RNN backwards

RNNs have a lot of parameters, so we restructured what was previously
a single 'fn' object that recorded all the parameters into three
objects: RNNDescriptorParams, TensorDescriptorListParams and
DropoutDescriptorParams.

We make use of MatrixRef to organize the weight tensors (which are
weight/bias x number of layers), but I did not teach the codegen
how to pass these as arguments/return values natively, so instead
a MatrixRef is passed as its constituent ArrayRef and int64_t stride0.

cudnn_rnn has three differentiable outputs and one nondifferentiable
one, so it makes use of the support for hard-coded differentiable outputs.

I haven't deleted all of the descriptor code from Python, because dropout
initialization still goes through this codepath, that should be fixed soon
but I don't see it as essential for this PR.

This commit also removes the last use of NestedIOFunction from PyTorch.

There are some shenanigans with cuDNN dropout descriptor initialization,
see below:

Note [cuDNN dropout descriptor initialization]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In most cases, setting descriptors in cuDNN is cheap (e.g.,
cudnnSetTensorNdDescriptor).  However, this is not the case for
cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an
expensive precomputation to initialize the random number generator states.  In
cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor,
which means that law-abiding clients were expected to generate a dropout
descriptor once and cache it.  However, our ATen interface is (1) stateless (so
we can't cache the descriptors) and (2) does not accept arbitrary user types in
its interface (so we can't pass the descriptor in).  This puts us in a pickle.

In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which
forgoes the expensive initialization process, and can initialize the
descriptor with a pre-initialized state CUDA tensor.  This is great, because
it means we can simply pass in the state tensor and then initialize the
descriptor internally.  Unfortunately, this function is not available in
cuDNN 6.

To work around this, we break the cuDNN abstraction barrier, and have
the struct layout of the underlaying dropout descriptor.  With this struct,
we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix cuDNN 7 behavior.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete some unused, controversial methods from MatrixRef.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add missing filter_dim_a slice

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Replace nested for-loop with itertools.chain.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR comment on mut_desc()

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Refactor DropoutDescriptor API.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Use cached CurrentDeviceProperties from Context.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Document _cudnn_rnn outputs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Improve fmap docs, convert some functions to use it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Move IndexRange to autograd/function.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Elaborate on CUDNN_STATUS_INVALID_VALUE return some more.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add an all-in-one setter for RNNDescriptorParams.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Print what the unrecognized RNN mode was

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* RNN TensorDescriptor improvements

- Have an explicit size/stride overload for set TensorDescriptor,
  so you don't have to create a goofy view to feed in.

- Change the padding to 3D rather than 5D, which is all you actually
  need (it's just 2D that is not supported by cuDNN API.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix implementation of cudnnRestoreDropoutDescriptor, plus test.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Better comments about input layout.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add comment about no-DropoutDescriptor argument RNNDescriptor function.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Rename vocab_size back to input_size.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Don't use backslash in comment.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Bugfix for contiguous TensorGeometry calculation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Make contiguity errors more user-friendly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* s/fn.dropout.train/fn_train/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Make dcx properly undefined when not required.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Remove old TODO.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add state size check in cudnnRestoreDropoutDescriptor

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Explicitly narrow int64_t to size_t

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Restore copyParams comment.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Update benchmark numbers, and slight engineering improvements.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Typofix.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-05 13:54:11 -05:00
28f056fed2 add reduce=True argument to MultiLabelMarginLoss (#4924)
* add reduce=True argument to MultiLabelMarginLoss

* Fix lint

* Addressed comments

* Remove unneeded syncthreads calls
2018-02-05 12:28:51 -05:00
d3ea7e260b Allow for all of the names we have in our model zoo.
Summary:
* We now allow subdirectories as well as numbers in the name.
* Also fixed an error case.
Closes https://github.com/caffe2/caffe2/pull/1875

Reviewed By: pjh5

Differential Revision: D6894401

Pulled By: orionr

fbshipit-source-id: 6a9938bc7d2ba6b8f094ed7b8a02664120a10626
2018-02-05 08:52:55 -08:00
ba61eee074 Expose sparse variable addmm, addmm_ (#5016)
sspaddmm, mm for sparse tensors to come in another pr; they're a little more involved.
2018-02-05 11:40:53 -05:00
76ae03d5f1 Operate on Variables in torch.nn.init (#4964)
Once Variable and Tensor are merged the existing Variable test would
cause an infinite recursion. Instead, modify the Variables directly
inside a `no_grad()` block.
2018-02-05 11:34:05 -05:00
f4a2b0e446 Don't allow scalars where vectors are required in mv, addmv, ger, addr. (#5003)
* Don't allow scalars where vectors are required in mv, addmv, ger, addr.

* Fix scalar_tensor_test for ger.

* Address review comments.

* Fix merge.
2018-02-05 11:09:18 -05:00
b044c95129 Use blocks machinery to simplify bookkeeping in autodiff (#5036)
* Remove addValues and use WithInsertPoint

* Use blocks to simplify differentiate

Using @ezyang's suggestion, this change uses a block rather than
staging annotations to represent the reverse pass. This allows us
to reuse the machinery to copy graphs/blocks to extract the
reverse pass concisely.

This also change the input order of Gradients df to:
   [output vjps][temporary vjps][captures]

In addition to being simpler to generate in this order, it also
will allow ExecutionPlan to append the captures onto the already-
existing input list of vjps that are given by the autograd,
rather than have to prepend them, which should be slightly cheaper.

* Enforce that input capture are before outputs

This changes the Gradient struct to enforce that input
captures appear before output captures in the capture list,
which makes it easier to use in ExecutionPlan.
2018-02-05 10:43:50 -05:00
c65bd6660e Move the cudnn include path before system include path (#5026)
In some cases when there are two different versions of cudnn installed,
one under /usr/local/cuda and other under a virtual env such as conda or
under the main system path /usr/include, the compiler would pickup the
cudnn.h from the virtual env/system path first. This is because cmake
generates C_INCLUDES and CXX_INCLUDES flags with system include path
first. All this may lead to linking problems as described in Issue #4869

Fixes #4869
2018-02-04 10:36:22 -05:00
85a7e0fc41 Addition of ExponentialFamily (#4876) 2018-02-04 12:18:28 +01:00
3acce3e4a7 assert global_constant name as string
Reviewed By: kennyhorror

Differential Revision: D6895157

fbshipit-source-id: 9844ab6176d22c6d05a5a0f83b731f734ef9853d
2018-02-04 01:02:30 -08:00
95626737d0 enforce global_constant name should be a string
Reviewed By: kennyhorror

Differential Revision: D6880114

fbshipit-source-id: 2c9bd27b01cedb469f19843163b04a613fda5904
2018-02-04 01:02:27 -08:00
9c7ac85050 Replace more sample_n calls in test_distributions.py (#5034) 2018-02-03 20:57:53 -05:00
61b5ea85d4 Remove FunctionFlags (#5018) 2018-02-03 20:57:39 -05:00
f8388d2aea Add the ability to change the insert point Graphs
In lieu of a more complicated builder object, this commit adds
an 'insert point' to Graph and a method 'insertNode' which inserts
nodes at that insert point. setInsertPoint can be used to change
the insert point on the graph to the end of a block or to any point
inside a current block. The resource guard `WithInsertPoint`
can be used to temporarily change it to, for example, insert
into the "then" branch of an If statement.

This commit also updates the resource guard for scopes. It previously
relied on return value optimization to work correctly which is
not guaranteed to be applied until C++17.
2018-02-03 12:09:40 -08:00
423677bacc Add KL-divergence for Categorical and OneHotCategorical and stronger tests (#4961) 2018-02-03 12:47:13 +01:00
99ce581155 Add support for ::copy and ::createClone with blocks 2018-02-02 23:24:49 -08:00
0d748fac96 Add nested Blocks in IR
This commit is getting the IR ready for representing ONNX control flow.
It adds nested blocks to the IR.

* Each node now has blocks(), addBlock(), and eraseBlock() similar to a node's
  output list.
* Blocks are a property of every node rather than an attribute because
  to make it easier to manage the lifetime of the containing nodes and because
  the behavior of cloning Blocks will likely be different from the way we clone other
  attributes.
* A block itself has a list of nodes, as well as inputs and outputs.
  The meaning of the nested input/output nodes are specific to the particular
  node kind containing the block. It is safe to assume inputs to a block will be
  in scope in the block.
* Each Block has an owningNode() and each node has an owningBlock().
  The owningNode of the top-most block is null.
* Values are lexically scoped: nested blocks can use values from outer blocks
  that have been defined in previous nodes. Lint has been updated with these
  new scoping rules.
* This change preserves almost all of the pre-Block API. No attempt has been made
  to make optimizations aware of Blocks. This will need to be done on a case-by-case
  basis as we make optimizations capable of handling Blocks.
2018-02-02 23:24:49 -08:00
c308e03f3e Initial GraphExecutor Implementation. (#4982)
This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.
2018-02-02 17:45:59 -08:00
3708914bd5 Give NetObserverReporter a virtual destructor for correct destruction
Summary:
Future-clang is stricter about some things. We need to address deletes on non-virtual destructors.

For reference, the compiler error in question can be identified by: "delete called on 'ClassName' that is abstract but has non-virtual destructor [-Werror,-Wdelete-non-virtual-dtor]" for a given ClassName.

Reviewed By: smeenai

Differential Revision: D6853479

fbshipit-source-id: a40c8e83da7c1b44da48e887cc029e98e40d6737
2018-02-02 17:32:26 -08:00
b0d09dd8d7 Cleanup operator docs for catalog generation.
Summary:
* Likely need to test this so bad formatting can't be added in the future, but cleaning all operators so we at least have good examples.
* Formatting between our internal Facebook operator catalog and external caffe2.ai catalog are still slightly different. We'll work on this.
Closes https://github.com/caffe2/caffe2/pull/1846

Reviewed By: pjh5

Differential Revision: D6848570

Pulled By: orionr

fbshipit-source-id: b9bc0bfccb243d0440bd7b2406858cad8dc37e92
2018-02-02 16:36:05 -08:00
e816c777eb Add regularization for sparse features
Reviewed By: xianjiec

Differential Revision: D5767997

fbshipit-source-id: b9b7c47d11417fbe67d861a2a6b4daa38adbe57b
2018-02-02 16:03:32 -08:00
dabddd65f4 Add sparse normalization operator
Reviewed By: xianjiec

Differential Revision: D6735673

fbshipit-source-id: 870b38d5175cb2d2dcad43c0e9fa4746e4dd15dd
2018-02-02 15:05:59 -08:00
4ae05799fa Don't allow scalars in torch.dot for Variables. (#4972)
* Don't allow scalars in torch.dot for Variables.

There is no dot_out, so the lack of _out isn't an issue.

* Fix test for 1-d only dot.
2018-02-02 16:07:27 -05:00
56112cbafd Add .clang-format (#5019) 2018-02-02 14:34:03 -05:00
7400de3080 Fix C FFI extension after moving TH to C++ (#5005)
* fix cffi include issue

* add cffi test

* disable cffi test for python 3.7

* and new line and comment
2018-02-02 12:45:30 -05:00
f23feca681 Fix output_nr not incremented correctly (#4812)
* fix output_nr not incremented correctly

* update test_conv_double_backward to cover this case; call accGradParameters if any param (not just weight) requires grad in parse_nn.py

* update Spatial/VolumetricFull(Dilated)Convolution to support accGradParameters with only bias requiring grad

* Spatial/VolumetricConvolutionMM

* Spatial/VolumetricDilatedConvolution

* address @fmassa 's comments
2018-02-02 12:39:33 -05:00
e22095b09d Add some more builder scripts from ossci-job-dsl (#4945)
* Add some more builder scripts from ossci-job-dsl

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Relax precision requirement on test_Upsample_trilinear_scale_3d_cuda

Partially addresses #5006.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-02 11:14:56 -05:00
bf3655a10c make torch.set_num_threads also set MKL threads (take 2) (#5002)
* torch.set_num_threads sets MKL option too

* fix to use C prototype instead of fortran
2018-02-02 09:24:54 -05:00
86fd5fd524 Replace async with non_blocking for Python 3.7 (#4999)
* Replace async with non_blocking for Python 3.7 upgrade

* Remove trailing whitespace

* Give _cuda and _type kwargs and accept async for compatibility

* Rename async to non_blocking in all C++ code

* Add entries for async in python_variable_methods

* Friendlier backward compatibility for cuda and type
2018-02-02 09:23:51 -05:00
8e22f847ad Improve CUDA softmax performance 2018-02-02 13:23:56 +01:00
390d542db2 Modify .jenkins/test.sh to install ninja 2018-02-01 22:42:07 -08:00
733ce9529e [cpp-extensions] Implement torch.utils.cpp_extensions.load() 2018-02-01 22:42:07 -08:00
142a335b81 fix ModOp Windows build issue
Summary: It seems that integral in std:signbit is not well supported in Windows. Bypassing it.

Reviewed By: xianjiec

Differential Revision: D6869924

fbshipit-source-id: b98a3431c4d26dcffd08e26259037083afd41114
2018-02-01 21:14:59 -08:00
39b351ecb0 Fix build with NNPACK
Summary:
- Fix path to FXdiv and FP16 dependencies
- Link cpuinfo library
- Pull NNPACK fix for PYTHONPATH handling when launching PeachPy
- Pull cpuinfo fix for cross-compiling on Linux for Android
- Pull cpuinfo fix for CPUINFO_LIBRARY_TYPE support
- Pull cpuinfo fix for iOS builds
Closes https://github.com/caffe2/caffe2/pull/1869

Differential Revision: D6881428

Pulled By: Maratyszcza

fbshipit-source-id: 7b4115daa090096dbd97303503792e7b144fbb43
2018-02-01 20:47:10 -08:00
a69110c0d7 Add size checks for sparse tensor constructor (#4113)
* Add size checks for sparse tensor constructor

* Fix tests

* Free max_indices
2018-02-01 22:08:20 -05:00
4d656842d9 enable USE_MOBILE_OPENGL by default
Summary:
iOS is also depend on USE_MOBILE_OPENGL, so I think we should only disable it for Android.
Closes https://github.com/caffe2/caffe2/pull/1835

Differential Revision: D6880522

Pulled By: Maratyszcza

fbshipit-source-id: b2c2fa052ad5948bc52fa49eb22c86eb08f59a39
2018-02-01 18:57:38 -08:00
1475895c1d Use distutils.copy_tree/copy_file instead of shutil 2018-02-01 16:19:03 -08:00
9e36f979c9 Add ABI compatibility check to cpp_extensions.py 2018-02-01 16:19:03 -08:00
1262fba8e7 [cpp extensions] Create torch.h and update setup.py 2018-02-01 16:19:03 -08:00
6665a45d5e Add README.md for ATen/cudnn (#4998) 2018-02-01 18:29:31 -05:00
ce5ccaef0c Rewrite ATen native docs. (#4816)
* Rewrite ATen native docs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Formatting fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Some of the CR comments

* More CR comments [ci skip]

* One last CR comment
2018-02-01 16:56:14 -05:00
eb5daa9478 Make cat/cat_out native function that rejects scalar inputs. (#4992)
* Make cat/cat_out native function that rejects scalar inputs.

* Print position of scalar in error message.
2018-02-01 16:20:01 -05:00
a8bda67ff1 Only check that arguments are Variables in VariableType (#4991)
Don't check the ScalarType and Backend of arguments in VariableType.
Instead, only check that arguments are Variables of any type. The
precise type checks are handled by the base type.

Many of our functions take heterogeneous types. There isn't enough
information in Declarations.yaml to ensure the precise types of
arguments in VariableType, which makes it difficult to add new methods.

This is #4943 with a fix to the memset call
2018-02-01 14:56:56 -05:00
2ebd7f17eb Support stack_out as a native function. (#4977) 2018-02-01 13:57:26 -05:00
d183f2305f Add scalar autograd tests for functions requiring 'special' Variables… (#4953)
* Add scalar autograd tests for functions requiring 'special' Variables on LHS.

* Add index_* tests.

* Fix flake8.

* Use normal for clamp rather than uniform.

* Add tests for gather, scatter, scatter_add.

* Make sure masked_select doesn't get all zeros.

* Properly fill in make_non_contiguous data for sizes that can't be mad… (#4951)

* Properly fill in make_non_contiguous data for sizes that can't be made contiguous.

* Use clone instead of copy.

* Fix and test backward for mv, ger with scalars.

* Fix addmv.

* Use grad.type() instead of type(grad).

* Fix addr.

There are a couple of hacks here:
1) We need to squeeze the backward result because of implicit broadcast of the arguments to match behavior of ger.
2) The broadcast_dims code doesn't work for scalars; I added support for adding '.scalar' onto the end of the broadcast
   specification, but really this should just be a native function with _out support.

* Don't allow scalars in torch.dot for Variables.

There is no dot_out, so the lack of _out isn't an issue.

* Revert "Don't allow scalars in torch.dot for Variables."

This reverts commit 76c521eba8c1fb533e164f121075230209d52927.

* Revert "Fix addr."

This reverts commit afe04a0078394f94645e10cec53626f582cbc55c.

* Revert "Fix addmv."

This reverts commit 550c7ac71b3b832a3b74a809fec9ce5f5e554909.

* Revert "Use grad.type() instead of type(grad)."

This reverts commit ddcb5a424ed004fa2ee238a50177573e6d4a1b89.

* Revert "Fix and test backward for mv, ger with scalars."

This reverts commit 10b0ecad48d987774c41184ffaf11742322926ab.
2018-02-01 13:56:38 -05:00
2bf9ed8e05 Revert "Only check that arguments are Variables in VariableType (#4943)" (#4980)
Revert "Only check that arguments are Variables in VariableType (#4943)"
2018-02-01 11:59:21 -05:00
e138203d8f add sparse_to_dense_test
Summary: hypothesis_test have been introduced in D4508879, add a plain test which is more straightforward.

Reviewed By: kennyhorror

Differential Revision: D6835334

fbshipit-source-id: d05a2cd199b2de56ac0cc0319f19fcd7978647d5
2018-02-01 08:14:37 -08:00
7ee286c80a Vendor NNPACK dependencies with Caffe2 2018-01-31 21:05:07 -08:00
bac898dbfa Add is_test option to CTCOp, fix OMP thread count override
Summary:
Added forward-only mode to CTCOp to compute only the costs without the grads.

Also, num_threads was set to 1, which ends up stomping over
--caffe2_omp_num_threads mid-execution (https://fburl.com/uq65xfty). Fixing
that to use the already configured num OMP threads.

Reviewed By: ajtulloch

Differential Revision: D6867829

fbshipit-source-id: 9ab1fec9857e00d277a9e82c4bd64caa6f4b2a62
2018-01-31 19:36:47 -08:00
f652f20f73 change ModOp to support output sign configurations
Summary: enable ModOp to control the output sign to follow dividend or divisor.

Reviewed By: xianjiec

Differential Revision: D6852457

fbshipit-source-id: 62dbb66cacecb8e0a0f81f63f2b7b378efbd6ee2
2018-01-31 18:03:16 -08:00
65b0474527 Fix for finding protobuf on windows
Summary:
On windows when using a prebuilt version of protobuf (such as provided by vcpkg) we need to set the PROTOBUF_LIBRARIES and PROTOBUF_INCLUDE_DIRS manually.

The CAFFE2_API decoration should only be defined to dllexport when building shared libs.
Closes https://github.com/caffe2/caffe2/pull/1854

Differential Revision: D6867345

Pulled By: Yangqing

fbshipit-source-id: d4d48f709d313af9dde103fc8dfbfc217261715b
2018-01-31 18:03:15 -08:00
eee42748d9 Caffe2: serialize init for parallel workers
Summary: Caffe2: serialize init for parallel workers

Reviewed By: kevinwilfong

Differential Revision: D6862119

fbshipit-source-id: 805b2971eca4501977950420565bd9ea37dc0f6c
2018-01-31 17:50:10 -08:00
5daf4ca1c9 Remove android-cmake submodule 2018-01-31 17:27:06 -08:00
964707e9b5 temporarily disable test_segfault until we figure out why it intermittently fails on cuda CI workere (#4976) 2018-01-31 19:04:44 -05:00
3a82c41d95 Fix for glog on windows
Summary:
These changes are required to use glog on Windows.

Yangqing Please consider merging them as they were removed when PR #1793 was reverted.
Closes https://github.com/caffe2/caffe2/pull/1853

Differential Revision: D6863567

Pulled By: Yangqing

fbshipit-source-id: f6ce3a1c5855e2b39000ce989d62dc2b34cd4817
2018-01-31 15:52:22 -08:00
401eeb2007 s/sample_n(n)/sample((n,)) to silence warnings in test_distributions.py 2018-02-01 00:29:50 +01:00
65353f1342 Remove volatile section from autograd notes 2018-02-01 00:26:36 +01:00
f2fd38c53c Use TypeError in PythonArgParser (#4966)
Uses TypeError from torch/csrc/Exceptions.h in python_arg_parser.cpp so
that the exception is interpreted as a Python TypeError instead of
RuntimeError.
2018-01-31 18:21:03 -05:00
7c8843f1c0 Async_scheduling update
Reviewed By: romain-intel

Differential Revision: D6824067

fbshipit-source-id: 00c94afad53941a63971deccea8ee1fff9860764
2018-01-31 15:06:40 -08:00
1b4959e48d Type error message when RTTI is not enabled
Summary:
When RTTI was not enabled, previously we can only print
(RTTI not enabled ...) type error message. This is annoying when developing
on mobile environment. Adding gRegistry when #T to have basic string for type
easy type inference

Reviewed By: Yangqing

Differential Revision: D6849614

fbshipit-source-id: d41417d72fdcfb7b8c9ddc4ded604ea598572b73
2018-01-31 15:06:39 -08:00
f2d3f20f6d Revert "torch.set_num_threads sets MKL option too" (#4967)
* Revert "Clarify grad_input_mask documentation in derivatives.yaml (#4963)"

This reverts commit 6f3266b4a195db6ade4651431595f9f22bd9e656.

* Revert "fix triu and tril for zero-strided inputs on gpu (#4962)"

This reverts commit 6c197c2f15090ab7368d183439229b768ece5efc.

* Revert "Add mutex for CPU RNG and move TH to C++ (#4041)"

This reverts commit 96239dd50e89bc2d1fd5d91cc5ee8fca95b07f90.

* Revert "Support multivariate TransformedDistributions (#4937)"

This reverts commit ca5071d0721767fcfeb226b5c695dfd5d0671072.

* Revert "Only check that arguments are Variables in VariableType (#4943)"

This reverts commit d44437968f2b136a3399dc62af66adfd3eaa249e.

* Revert "torch.set_num_threads sets MKL option too (#4949)"

This reverts commit 2aaeec0db0be0e9e9effd277c268cd224ff66ef9.
2018-01-31 15:38:49 -05:00
6f3266b4a1 Clarify grad_input_mask documentation in derivatives.yaml (#4963) 2018-01-31 14:45:25 -05:00
6c197c2f15 fix triu and tril for zero-strided inputs on gpu (#4962) 2018-01-31 14:38:49 -05:00
96239dd50e Add mutex for CPU RNG and move TH to C++ (#4041)
* Add mutex for CPU RNG

* move more things to cpp to make cuda build work

* fix mutex bug on OS X

* try to fix cuda9 half .x bug

* try to fix windows error

* create THGeneratorState as seperate field

* fix mutex issues
2018-01-31 14:26:39 -05:00
ca5071d072 Support multivariate TransformedDistributions (#4937) 2018-01-31 18:32:24 +01:00
d44437968f Only check that arguments are Variables in VariableType (#4943)
Don't check the ScalarType and Backend of arguments in VariableType.
Instead, only check that arguments are Variables of any type. The
precise type checks are handled by the base type.

Many of our functions take heterogeneous types. There isn't enough
information in Declarations.yaml to ensure the precise types of
arguments in VariableType, which makes it difficult to add new methods.
2018-01-31 12:31:11 -05:00
2aaeec0db0 torch.set_num_threads sets MKL option too (#4949) 2018-01-31 11:59:15 -05:00
3736ccb1d8 git clone from the master branch in the docker files because branch v0.8.1 does not exist
Summary: Closes https://github.com/caffe2/caffe2/pull/1520

Reviewed By: Yangqing

Differential Revision: D6853197

Pulled By: orionr

fbshipit-source-id: f0a15cd977617294dc1754e7658056ec20e15db2
2018-01-30 21:21:01 -08:00
3ac412efe9 Properly fill in make_non_contiguous data for sizes that can't be mad… (#4951)
* Properly fill in make_non_contiguous data for sizes that can't be made contiguous.

* Use clone instead of copy.
2018-01-31 00:09:12 -05:00
90a3363f29 Return an empty TaskGroup if node managers exist in MultiNodeCheckpointManager
Summary: Current MultiNodeCheckpointManager return None in this case, yet in JobRunner we assume this function returns a valid task group, i.e. we call session.run(self.checkpoint_manager.init(...)) directly. This will fail the case we use LocalHostScheduler and reuse a MultiNodeCheckpointManager

Reviewed By: azzolini

Differential Revision: D6843450

fbshipit-source-id: a7ec942cfe692f19e8751b0078ae6a6108f29e54
2018-01-30 19:20:50 -08:00
e776f69ddd Urgent CI fix for test.sh (#4955)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-30 21:41:02 -05:00
20fbdb9a8b Adding mean, variance, stddev to distributions (#4923) 2018-01-31 00:26:32 +01:00
ae903ca61a Fix JIT tracing in autograd codegen (#4941) 2018-01-31 00:14:36 +01:00
8f273dea09 Implement constraint registry 2018-01-31 00:13:28 +01:00
52bd369da5 Add some scalar test_autograd tests for multi-tensor functions (#4944)
* Add some scalar autograd tests for functions taking multiple tensors/variables.

* Add skipIfNoScalars.
2018-01-30 16:45:55 -05:00
004e7590ac Add suppress_warnings to test_resize. (#4942) 2018-01-30 16:03:34 -05:00
98a4c3f9b2 Enable rnn_cell_test in jenkins
Summary: Closes https://github.com/caffe2/caffe2/pull/1839

Differential Revision: D6847623

Pulled By: salexspb

fbshipit-source-id: b8a32cb39a8063b8938c89556e5d42606735238d
2018-01-30 11:48:35 -08:00
9d1721e588 Adding gflags to default dependency of conda builds
Summary: Closes https://github.com/caffe2/caffe2/pull/1842

Reviewed By: orionr

Differential Revision: D6845845

Pulled By: pjh5

fbshipit-source-id: 311af987ae94977e50069bdb1e98d652eddfb2c8
2018-01-30 11:30:29 -08:00
5b43c22f73 Add symbolic_override_first_arg_based (#4799)
* Add symbolic_override_first_arg_based

* flake fix

* comment

* remove comment (keep forgetting about this PR)
2018-01-30 16:41:43 +01:00
c011c8b5a6 Enable fixed tests again in Windows (#4928) 2018-01-30 16:33:49 +01:00
ef4cf860ac Lazy init in set device, also should not be called in getDevCount (#4918) 2018-01-30 16:24:31 +01:00
ee8bcdca79 make torch.cuda.empty_cache() a no-op when cuda is not initialized (#4936) 2018-01-30 16:22:17 +01:00
5c65466b86 Release NCCL distributed backend from experimental (#4921)
* Release NCCL distributed backend from experimental

* fix typo
2018-01-30 16:21:21 +01:00
ea0283325c fix copy/paste error in debug message 2018-01-30 12:16:59 +01:00
8e8e3eb828 Fix build failure
Summary: Fix build failure

Reviewed By: pietern

Differential Revision: D6843835

fbshipit-source-id: b57b42c6a455325801a1ca6ab9a40d2f47490b11
2018-01-29 23:08:47 -08:00
560e5c94bd Change default value of LeakyRelu's alpha from 0 to 0.01
Summary: To match the semantic in ONNX, change the default value of alpha of LeakyRelu to 0.01

Reviewed By: dzhulgakov

Differential Revision: D6840975

fbshipit-source-id: 08543f80fd86cbe96a0eee8d725ef137a5bf4ab8
2018-01-29 22:31:12 -08:00
6b1f848df6 Adds gpu implementation for FCTransposed
Summary: Adds gpu implementation for FCTransposed.

Reviewed By: salexspb

Differential Revision: D6572785

fbshipit-source-id: a7cd0f7364ace286942c46b91e0287307cbfea83
2018-01-29 19:03:24 -08:00
60f5ae05ee Add more scalar autograd tests. (#4920)
These are from auto-generated tests from existing tests with the following constraints:
1) Forward function passes with scalar self and size (1,) self
2) No Variable/Tensor arguments (besides self)
2018-01-29 20:53:56 -05:00
6f0b7bea03 Add support for requires_grad in JIT's AD (#4898) 2018-01-30 01:28:50 +01:00
712a6c6362 Deprecate out-of-place resize and resize_as on Variables. (#4886)
* Deprecate out-of-place resize and resize_as on Variables.

* Use default UserWarning instead of DeprecationWarning for Variable resize.
2018-01-29 18:02:06 -05:00
d1a3254764 Use global thread pool in async_scheduling
Summary:
Simplify async_scheduling to use global thread pool instead of per network
polling threads

Reviewed By: romain-intel

Differential Revision: D6814274

fbshipit-source-id: f91ac3e99d9b8cf15578a751ed7929be84840408
2018-01-29 14:54:43 -08:00
78ff996dc0 Fix some scalar issues with autograd. (#4889)
* Fix some scalar issues with autograd.

1) Better error messsages in functions that don't support scalars
2) Don't access size(dim) in the backward of a function taking a scalar because the wrap fails.

* Fix CUDA build.
2018-01-29 17:50:41 -05:00
3c952426fb Add operator attaching net observer
Summary:
Commonly, net observers attach operator observers at construction. This diff separates the logic into a base class to inherit from.
Closes https://github.com/caffe2/caffe2/pull/1806

Reviewed By: salexspb

Differential Revision: D6808623

Pulled By: mdschatz

fbshipit-source-id: 75ef0eea913ef30943541c829c0a976965f42736
2018-01-29 14:34:34 -08:00
2a6177e6de Speed-up repeat autograd tests. (#4915)
These tests were incredibly slow because they needed to compute a
jacobian matrix with 9 million elements. Reduce the sizes used in the
test cases.
2018-01-29 16:34:01 -05:00
12c6088267 Fixes to native_functions.yaml to match existing Tensor behavior (#4911)
- Add default 'p' value for bernoulli_
 - Bind expand, expand_as, and permute only as functions
2018-01-29 15:24:23 -05:00
260a246192 Move repeat autograd to C++. (#4885) 2018-01-29 15:09:59 -05:00
e93ece90a5 Add Linux Jenkins scripts to PyTorch repo. (#4910)
Putting these scripts here has a few benefits:

1. PyTorch developers can easily update the scripts without
having to ask for permissions to ossci-job-dsl

2. You can test changes in the scripts by opening a PR to
PyTorch (functionality is ossci-job-dsl is not easily testable.)

3. If you get one of our stock Docker images, you can run these scripts
to trigger a build identical to what would occur in Jenkins (not
entirely true yet, but we can make it so.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-29 15:08:06 -05:00
f0acd68536 Fixes to aten/Declarations.cwrap (#4912)
- Make transpose and as_strided work on CPU half
- Bind at::pow(float, Tensor)
- Add _dirichlet_grad from TensorRandom.cwrap
2018-01-29 14:53:37 -05:00
4f63f348ae Fix condition in inferUnsqueezeGeometry (#4909)
The bounds check was too conservative by an extra one.
2018-01-29 14:53:03 -05:00
91d76f5dbd Reapply Windows fix
Summary:
Last fix was uncommitted due to a bug in internal build (CAFFE2_API causing error). This one re-applies it as well as a few more, especially enabling gtest.

Earlier commit message: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work other than gpu shared_lib, which willyd kindly pointed out a symbol limit problem. A few highlights:
(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export for windows.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
(7) enabled gtest and fixed testing bugs.

Earlier PR is #1793

Closes https://github.com/caffe2/caffe2/pull/1827

Differential Revision: D6832086

Pulled By: Yangqing

fbshipit-source-id: 85f86e9a992ee5c53c70b484b761c9d6aed721df
2018-01-29 10:03:28 -08:00
657214543c add back cuda auto arch detection
Summary:
This was removed in an earlier version. Anyway, I suspect this will make jenkins a bit unhappy (do we use gpu instances for building as well?) so firing a PR to test.
Closes https://github.com/caffe2/caffe2/pull/1833

Differential Revision: D6834889

Pulled By: Yangqing

fbshipit-source-id: bc501cdb9d83a32ad38d24e972c2bfec5242d767
2018-01-29 10:03:27 -08:00
f8439c241b Add some explicit scalar autograd tests. (#4888)
* Add some explicit scalar autograd tests.

* Fix flake8.

* Test negative dims more consistently.
2018-01-29 12:40:15 -05:00
7a47790c27 Add missing _lazy_init in cuda python functions 2018-01-29 18:19:03 +01:00
c3596c2dfa remove "-s" compilation flag from clang when build for Android
Summary:
Now we use **clang** to build Caffe2 for Android with arm64-v8a ABI, but clang doesn't support "-s" compilation flag. If we append this flag to clang, it will report a warning:

> clang++: warning: argument unused during compilation: '-s' [-Wunused-command-line-argument]

This submit will check we use gcc or clang to build Caffe2 for Android.
Closes https://github.com/caffe2/caffe2/pull/1834

Differential Revision: D6833011

Pulled By: Yangqing

fbshipit-source-id: e4655d126fb3586e7af605a31a6b1c1ed66b9bcb
2018-01-29 00:03:02 -08:00
94b659034e Potential fix to net_test failure with one GPU
Summary:
* Putting up to test on Jenkins since I can't test locally on my Mac.

Might fix https://github.com/caffe2/caffe2/issues/1796 but I haven't touched these files before, so it's a guess. :)
Closes https://github.com/caffe2/caffe2/pull/1826

Reviewed By: Yangqing

Differential Revision: D6832918

Pulled By: orionr

fbshipit-source-id: 22bdeafa031dbe6457d81cb105b41a451ca3a25d
2018-01-28 23:25:51 -08:00
2d829d15af [JIT] Add simple shape analysis
This quick and dirty shape analysis just makes up fake tensors,
and runs them through ATen to do shape propagation.
2018-01-28 22:55:36 -08:00
3b38a244ab Add ArgumentSpec data structure and tests
This data-structure will be used as the key in GraphExecutor's
code cache. It supports fast creation, hashing, and equality checking
because it will run on all inputs to GraphExecutors in the hot path.
2018-01-28 22:55:36 -08:00
d481afb125 Modernizing glog. Same as gflags.
Summary:
Same as PR #1819.
Closes https://github.com/caffe2/caffe2/pull/1830

Differential Revision: D6832171

Pulled By: Yangqing

fbshipit-source-id: 462a9b807e78d60748160a0cfd24932c9003fcc3
2018-01-28 18:21:22 -08:00
64a9ecae02 Dataloader issues (#4643)
* EINTR and kill by loader fix

* addressed @apaszke 's comments

* remove EINTR handling and add test if we are in main thread before setting SIGCHLD
2018-01-29 01:18:17 +01:00
967bceb16b Implement Transforms (#4771) 2018-01-28 21:17:16 +01:00
3ecd25b065 fix indentation 2018-01-28 20:56:57 +01:00
ff3f689239 Add mote tests for Nccl backend (#4796) 2018-01-28 12:36:59 +01:00
5630bb1fcc add compress flags to NCCL 2018-01-28 05:55:41 +01:00
73ed0d5ced Modernizing the gflags dependency in cmake.
Summary:
Historically, for interface dependent libraries (glog, gflags and protobuf), exposing them in Caffe2Config.cmake is usually difficult.

New versions of glog and gflags ship with new-style cmake targets, so one does not need to use variables. New-style targets also make it easier for people to depend on them in installed config files.

This diff modernizes the gflags library, and still provides a fallback path if the installed gflags does not have cmake config files coming with it.

It does change one behavior of the build process though - when one specifies -DUSE_GFLAGS=ON but gflags cannot be found, the old script automatically turns it off but the new script crashes, forcing the user to specify USE_GFLAGS=OFF.
Closes https://github.com/caffe2/caffe2/pull/1819

Differential Revision: D6826604

Pulled By: Yangqing

fbshipit-source-id: 210f3926f291c8bfeb24eb9671e5adfcbf8cf7fe
2018-01-27 19:31:14 -08:00
94e29ba24a Fix visibility of AT_CUDA_ENABLED (#4892)
* Fix visibility of AT_CUDA_ENABLED

* link ATen with verify_api_visibility so ATen headers get generated in time

* Move CUDAHalf.* to ATen/cuda

* ATen/cuda/CUDAHalf.cpp -> ATen/cuda/CUDAHalf.cu

* Remove inline attributes from HalfFix

* Also test for AT_CUDNN_ENABLED and add clarifying comment

* Remove unnecessary static inline from HalfFix template

* Move Half::operator double() into header for windows

* Mark Half::operator() as inline
2018-01-28 02:59:30 +01:00
e58a53af6f Added Poisson self KL + Bernoulli/Poisson KL 2018-01-27 22:36:00 +01:00
a249016044 New index computation strategy in Functions.cpp (Tensor/TensorList) (#4775)
When generating autograd::Function wrappers for ATen functions, we need
to take derivative expressions in derivatives.yaml (identified by name)
and correlate them with the correct index they should take in
grad_inputs (identified positionally only).  Previously, this
computation was done *statically* in load_derivatives.py (set_up_derivatives)
and then we hard-coded indices in the generated Functions.cpp.
This is sufficient for supporting ATen operations which consist solely
of Tensor arguments, or a single TensorList argument.  However, this
strategy will not work for mixed Tensor/TensorList arguments, as the
index of any Tensor after a TensorList is not known at codegen time,
since it will vary depending on the length of the TensorList, e.g.,

  foo({x1, x2}, y)      ==>  y is index 2
  foo({x1, x2, x3}, y)  ==>  y is index 3

This commit introduces a new strategy for generating these indices which
pushes index computation to *runtime* (though any decent C++ optimizer
can re-optimize the index computation back into constants; this was
verified in Godbolt.)  Instead of hard-coding constants, a small
IndexRangeGenerator object is created and used to generate the correct
index ranges (std::pair<size_t, size_t>) for each argument.

Here is an example of mm rewritten in the new codegen format:

  variable_list MmBackward::apply(const variable_list& grads) {
    IndexRangeGenerator gen;
    auto self_ix = gen.range(1);
    auto mat2_ix = gen.range(1);
    variable_list grad_inputs(gen.size());
    auto& grad = grads[0];
    auto self = self_.unpack();
    auto mat2 = mat2_.unpack();
    if (should_compute_output({ mat2_ix })) {
      auto grad_result = mm_mat2_backward(grad, self, mat2_sizes, mat2.strides(), 1);
      copy_range(grad_inputs, mat2_ix, grad_result);
    }
    if (should_compute_output({ self_ix })) {
      auto grad_result = mm_mat1_backward(grad, mat2, self_sizes, self.strides(), 1);
      copy_range(grad_inputs, self_ix, grad_result);
    }
    return grad_inputs;
  }

Unlike before, where self_ix and mat2_ix were hardcoded as 0 and 1,
we derive them by invoking IndexRangeGenerator (which internally
is just a little counter which bumps up each invocation of 'range').
Each _ix variable actually represents a range, as can be seen here.

  variable_list CatBackward::apply(const variable_list& grads) {
    IndexRangeGenerator gen;
    auto tensors_ix = gen.range(tensors_size_);
    variable_list grad_inputs(gen.size());
    auto& grad = grads[0];
    if (should_compute_output({ tensors_ix })) {
      auto grad_result = cat_tensors_backward(grad, tensors_sizes_dim, dim);
      copy_range(grad_inputs, tensors_ix, grad_result);
    }
    return grad_inputs;
  }

The invocation of 'copy_range' reads a TensorList returned by the
backward function into the correct entries in grad_inputs.
tensors_size_ is a new member of CatBackward which is filled with
the size of the forward input tensor when cat is originally invoked.

With this new code generation strategy, we can completely eliminate
the special cases for Tensor and TensorList in index selection, and
we can smoothly support mixed Tensor/TensorList by making multiple
invocations of gen.range() with non-one arguments.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-27 21:46:08 +01:00
bd9b8a384a Fix torch.pstrf on Variables (#4883)
The LAPACK function returns one-indexed pivots. We need to convert them
to zero indexed.
2018-01-27 11:13:31 -05:00
ae28411af8 Slightly improve DDP single GPU multi-process dist training performance 2018-01-27 12:15:44 +01:00
6420c6b224 Improve torch.cuda.empty_cache documentation (#4879)
* add doc about empty_cache wont increase amount of memory available

* typo
2018-01-27 04:54:25 -05:00
f8575f6d68 Breakdown Dispatcher
Summary: dispatch by Ngram breakdown

Differential Revision: D6794082

fbshipit-source-id: 7f6e8fa3a0abe0dc6d0d466c95e8c4fc865e3abb
2018-01-26 17:47:54 -08:00
33d2212751 LSTM sequence lengths: allow unspecified sequence lengths
Summary:
In this case, each sequence is treated as having a length equal to the
first dimension of the input tensor. This matches the semantics of
ONNX when the sequence length input is left out.
Closes https://github.com/caffe2/caffe2/pull/1764

Reviewed By: dzhulgakov

Differential Revision: D6751219

Pulled By: anderspapitto

fbshipit-source-id: 89e0efd12339157627494e2b8c83e952bdd8a9f8
2018-01-26 16:32:56 -08:00
4a528cefac Remove OpenGL code from benchmark
Summary:
OpenGL is no longer built by default. Even after setting flag -DUSE_MOBILE_OPENGL, the build fails. Remove it in the benchmark code so that the benchmark can still be built.
Closes https://github.com/caffe2/caffe2/pull/1822

Reviewed By: Maratyszcza

Differential Revision: D6824777

Pulled By: sf-wind

fbshipit-source-id: 5af8b669a36adcd6a98b0a11237b9e03c146bb9d
2018-01-26 16:17:53 -08:00
e7d4bbc9dd Add CaffeEnforce in SafeDequeueOp
Summary:
Preivously in SafeDequeueOp, the in.dims()[0] would fail if in.ndim()=0.
However the error message if not informative. I added a Caffe_Enforce,
which would print out the input and output blob name. This is very helpful for
future debugging as well.

Differential Revision: D6821421

fbshipit-source-id: b07e5829a2c580aaaac88b0d9ff8d05f6da11713
2018-01-26 13:50:32 -08:00
fe9121ff59 Fix a bug in BatchMM JIT pass
add node has multiple overloads, including one that only takes a
single input. This wasn't checked previously and could lead to
segfaults.
2018-01-26 22:40:36 +01:00
e5958d0e67 Inherit JIT scopes when cloning only when it's correct
It's correct only when the new graph owns the same scope tree
as the original one. We can end up with dangling pointers otherwise.
2018-01-26 22:40:36 +01:00
349a1c3424 Add code for lambda lifting backward in JIT's AD 2018-01-26 22:40:36 +01:00
db0f1e806c Add Variable (value) tests for variable fill, index_fill, masked_fill. (#4875)
* Add Variable (value) tests for variable fill, index_fill, masked_fill.

* Skip scalar tests if built without scalars.

* Fix flake8.

* Remove _scalar_sum remnants.
2018-01-26 16:09:25 -05:00
b8ab7bee26 Use variadic templates instead of initializer lists and overloads. (#4772)
Suppose you are given a list of arguments, each of which may be Tensor or
TensorList.  How can you write a function that can treat these arguments
uniformly as a list of tensors?  This patch solves the problem using
variadic templates.

Why variadic templates?  Use of variadic templates means anyone working
with this code has to understand universal references, perfect
forwarding, parameter packs and some idioms of C++ template design.
However, I argue that variadic templates are the *right* tool for
supporting the implementation of functions which must take an
arbitrarily heterogenous set of inputs.  We were able to limp by
in old code because, for the most part, tensor inputs were homogenous,
but this is no longer the case for some non-primitively differentiable
functions; and with the upcoming cuDNN RNN in ATen PR, will no longer be
the case for primitively differentiable functions too.

There are two parts to the PR.

First, we add torch/csrc/utils/variadic.h, which defines a mix-in
IterArgs that takes any class which supports operator(), and augments
with a new variadic function apply() which calls operator() on each
argument passed to it.  In an original draft of the patch, I wrote the
recursion for each parameter pack from scratch for each function;
however, it turns out there are no fewer than seven instances where we
need this idiom, and the mix-in reduces the lines of code, and also
helps centralize the most important (and easy to forget) boilerplate
for perfect forwarding.

To verify that IterArgs is compiled away into an unrolled form per
call site, I inspected the assembly on some synthetic examples.

Next, we modify the following functions to make use of IterArgs:

  - compute_requires_grad
  - Function::flags (Variable and Tensor variants)
  - flatten
  - isTracing
  - count_tensors / count_variables

Finally, the tuple packer is rewritten to be variadic, although we
cannot make use of IterArgs (since we are given a tuple).  It might
make sense to refactor the code into a generic piece which invokes
a function with the arguments specified by a tuple, and then an
appropriate IterArgs, but we leave this for future work.

One thing to note: we cannot write a function with overloads for both
Tensor and Variable, because both ArrayRef<Variable> and Tensor have
implicit conversions from Variable, making such an overload ambiguous.
It may be interesting to remove the implicit conversion from ArrayRef.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-26 15:56:39 -05:00
24177adc12 Make TensorDescriptor call more portable (#4878)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-26 15:31:53 -05:00
252211b001 testPairwiseDotProduct
Summary: as title.

Reviewed By: kennyhorror

Differential Revision: D6793829

fbshipit-source-id: f803e0400635ca37184f1dd5bb711bfe0e4bea21
2018-01-26 11:33:08 -08:00
51feaee007 Assorted small change to conda scripts
Summary:
More changes to be added later. I need to make a PR so that I can point jenkins to this
Closes https://github.com/caffe2/caffe2/pull/1767

Reviewed By: orionr

Differential Revision: D6817174

Pulled By: pjh5

fbshipit-source-id: 0fc73ed7d781b5972e0234f8c9864c5e57180591
2018-01-26 09:32:36 -08:00
84c6887d2a Switch cuDNN Descriptor classes to use unique_ptr. (#4850)
The primary benefit is now we have working move constructors
et al without having to write all the boilerplate.  Furthermore,
the size of the code is substantially reduced.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-26 12:28:15 -05:00
a3b8c459d4 Revamp MNIST tutorial
Summary:
Main changes:

1. Move reader creation to Brew in order to be consistent and avoid a wild use of param_init_net
2. Use optimizers for training function, avoid manual optimizer construction
3. Add MLP mode (a default)
4. Fix a bunch of too verbose comments and add a bit of new explanations
Closes https://github.com/caffe2/caffe2/pull/1760

Differential Revision: D6749059

Pulled By: salexspb

fbshipit-source-id: 9dfbbb2d9772a74a0300c2e404a92e791f7cc593
2018-01-26 09:17:31 -08:00
8c02674964 Revert D6817719: [caffe2][PR] Better support for windows
Summary:
This reverts commit d286264fccc72bf90a2fcd7da533ecca23ce557e

bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
cause_a_sev_many_files

Differential Revision: D6817719

fbshipit-source-id: 8fe0ad7aba75caaa4c3cac5e0a804ab957a1b836
2018-01-26 06:08:49 -08:00
8aa8eaabb1 Better support for windows
Summary:
Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work. A few highlights:

(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
Closes https://github.com/caffe2/caffe2/pull/1793

Reviewed By: dzhulgakov

Differential Revision: D6817719

Pulled By: Yangqing

fbshipit-source-id: d286264fccc72bf90a2fcd7da533ecca23ce557e
2018-01-26 00:48:43 -08:00
849b0a0e0e Update SNPE readme. Indicate libgnustl_shared.so is also needed to ru…
Summary:
…n snpe binaries
Closes https://github.com/caffe2/caffe2/pull/1776

Reviewed By: bwasti

Differential Revision: D6777970

Pulled By: sf-wind

fbshipit-source-id: 86a863536afadb2f22303b065e1dfcd3896f1152
2018-01-25 16:35:23 -08:00
5a5afa5c17 Properly define 'true' in test. (#4859) 2018-01-25 18:40:23 -05:00
0fd41a63a1 Integrate Fused8BitRowwise ops with DPER
Summary: Updates `sparse_lookup.py` for the new fused 8-bit rowwise quantization. Mostly just changing the same files as the original diffs (D5753626 and D5761202). I know very little about this code here so please let me know if this is safe, also in terms of migration away from the non-fused storage.

Reviewed By: kennyhorror

Differential Revision: D6710784

fbshipit-source-id: 185f147af52a094a937ba631b0351225e660d205
2018-01-25 15:02:42 -08:00
483828e25e Don't throw exceptions inside OpenMP parallel blocks (#4857)
Fixes undefined behavior: exceptions are not allowed to be thrown across
OpenMP constructs.
2018-01-25 17:56:19 -05:00
08dc40a5de Use case insensitive names for Doxygen docs.
Summary:
* This way we won't have issues across Linux and Mac.
* Also eliminates some weirdness where files with both capitalizations existed.
Closes https://github.com/caffe2/caffe2/pull/1813

Reviewed By: pjh5

Differential Revision: D6812141

Pulled By: orionr

fbshipit-source-id: 27f52089e2db623196349d7036aa8882e93c32fd
2018-01-25 14:33:18 -08:00
8aa3dab959 doc: update installation.md for third_party packages
Summary:
PR Description
-----------------

This commit informs the developers why they have to use packages of third_party
folder instead of packages in their Linux distribution.

By default, Caffe2 find installed packages in the Linux distribution. If it
cannot be found, as a next step Caffe2 uses the version bundled in third_party folder.

**Changes proposed in this PR:**
1. Added difference between Linux distro packages and third_party packages

**Self assessment:**
Checked.

Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
Closes https://github.com/caffe2/caffe2/pull/1724

Reviewed By: pjh5

Differential Revision: D6728185

Pulled By: orionr

fbshipit-source-id: 0c596cf56faaccf947caefc49ea3c6f0a473e9bf
2018-01-25 14:33:18 -08:00
304e607b70 Fix adam test
Reviewed By: pietern

Differential Revision: D6787780

fbshipit-source-id: a2d1428b0e028d6f3d8f7c312c90f3fa411cd0a2
2018-01-25 12:59:54 -08:00
0844b5b25c Fix deepcopy with scalars. (#4854) 2018-01-25 15:12:36 -05:00
2648428986 Various indexing fixes around scalars. (#4853)
1) Have 0-dim byte tensors behave like Py_TRUE, Py_FALSE
1) Py_TRUE now properly returns a copy from getitem
3) setitem now properly shapes the LHS consistent with the RHS (this doesn't really matter outside of error messages having the proper shape)
4) setitem supports numpy-style copy_to broadcasting (cuts off prefix 1s from src), so e.g. you can setitem (1,1,2,3) to (2,3) even though
   that doesn't follow the normal inplace broadcasting rules.
2018-01-25 14:05:14 -05:00
b2cfc5ea53 add KeySplitOp
Summary:
as titled

After converting categorical to Ngram keys, use this op to extract eids

Differential Revision: D6794020

fbshipit-source-id: 4f9251a22d7a129da30b92845e312876e6510e7e
2018-01-25 10:50:53 -08:00
d695027300 Adds cuda support for LC op
Summary: Adds cuda support for LC Op

Reviewed By: QueryConnectionException

Differential Revision: D6803659

fbshipit-source-id: 538bbf6fd202c79154132fda0e90e175eb09d025
2018-01-25 10:19:48 -08:00
c046da76ef More distributions fixes for scalars. (#4849) 2018-01-25 12:33:01 -05:00
e0b0328722 add cuda9 options to nccl 2018-01-25 11:19:36 -05:00
bb3bc969ca fix binary version scheme to be PEP compliant (#4847) 2018-01-25 11:16:02 -05:00
4c29d19f53 Update pybind11, fix #4809 (#4811)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-25 10:44:16 -05:00
e4ddbeb554 Fix typo (#4846) 2018-01-25 10:33:45 -05:00
aaa0288aed Implemented Poisson in Distributions.cu and Distributions.cpp 2018-01-25 10:28:29 +01:00
90543ff13a weighted sampling reader dequeue outputs table index
Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob.

Reviewed By: kennyhorror

Differential Revision: D6621070

fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b
2018-01-24 19:06:25 -08:00
c261b9ce70 Fix NGram from categorical test
Summary: Fix the flaky test for ngram from categorical test

Reviewed By: dragonxlwang

Differential Revision: D6801152

fbshipit-source-id: dcbae17b1d3737a41fb2f5c794c1146a02c542bb
2018-01-24 18:51:16 -08:00
afafe8a466 Add LC Layer
Summary: Add the 1st version of LC layer.

Reviewed By: Yangqing

Differential Revision: D6788647

fbshipit-source-id: ebee9215a1d6e1e567548a0fef771802851682a3
2018-01-24 16:51:17 -08:00
4970e73304 Add support for distributions and test_distributions when WITH_SCALAR… (#4834)
* Add support for distributions and test_distributions when WITH_SCALARS enabled.

* Fix flake8.
2018-01-24 19:22:05 -05:00
fc56e86c7d Introduce init API for the optional Checkpoint Metadata Handler object
Summary:
Every call to the checkpoint_metadata_handler write() API requires us to pass all params like db_prefix, db_type etc.
Introducing an init API in the checkpoint_metadata_handler so that such params can be saved and need not be passed in every API call

Reviewed By: mraway, anshulverma

Differential Revision: D6792651

fbshipit-source-id: 059fa4309e8fce1ee5ab009af3e0570573c24245
2018-01-24 15:19:55 -08:00
1b3d6ab864 Enabling Infiniband support for Gloo data channel with auto IB detection (#4795) 2018-01-24 23:18:24 +01:00
eea39dbdd9 Updated bbox_transform op to match detectron training code better.
Summary:
Updated bbox_transform op to match detectron training code better.
- Set apply_scale=False and correct_transform_coords=True to match detectron training/inference code.

Reviewed By: wat3rBro

Differential Revision: D6782894

fbshipit-source-id: 053d9847bf2b3c62a535499017a8413d78871ee0
2018-01-24 14:18:03 -08:00
278d398748 Add GPU version of math::Transpose
Summary: Add GPU version of math::Transpose

Reviewed By: Yangqing

Differential Revision: D6747958

fbshipit-source-id: 7047107609386c1ab53492381ca9bcf8bccd2924
2018-01-24 14:18:02 -08:00
3e8465bc02 Check if system has protobuf package when it already has protoc command
Summary:
When system has protobuf package but hasn't protoc, cmake will be success:

> -- ******** Summary ********
-- General:
--   CMake version         : 3.5.1
--   CMake command         : /usr/bin/cmake
--   Git version           : v0.8.1-967-g27d12d8-dirty
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 5.4.0
--   Protobuf compiler     : PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND
--   Protobuf include path : /usr/include
--   Protobuf libraries    : optimized;/usr/lib/x86_64-linux-gnu/libprotobuf.so;debug;/usr/lib/x86_64-linux-gnu/libprotobuf.so;-lpthread
...

Then make will be failed.
This submit make it to check protobuf package only when protoc has been found.

This pull request is a clone of [1781](https://github.com/caffe2/caffe2/pull/1781), that pull request closed by mistake.
Closes https://github.com/caffe2/caffe2/pull/1792

Differential Revision: D6800513

Pulled By: pietern

fbshipit-source-id: 79a77a139f342ae0aaa2c37fc1d9a74e28a08422
2018-01-24 13:45:30 -08:00
29a4c942fe Add support for multi-device batch normalization through an option to data_parallel_model
Summary: Stage 3 in stack of diffs for supporting multi-device batch normalization. Adds input parameter to data_parallel_model to enable multi-device batch normalization. Depends on D6699258.

Reviewed By: pietern

Differential Revision: D6700387

fbshipit-source-id: 24ed62915483fa4da9b1760eec0c1ab9a64b94f8
2018-01-24 13:24:06 -08:00
00a1092641 Add extra optional inputs to SpatialBN and SpatialBNGradient to enable multi-device batch normalization
Summary: Diff 2 in stack of diffs for multi-device batch normalization. Allows plugging of intermediate stats into SpatialBN and SpatialBNGradient to enable multi-device batch normalization. Depends on D6697336.

Reviewed By: rbgirshick

Differential Revision: D6699258

fbshipit-source-id: 1bae0b9a33d257f8de9525f8b2511bec2ec9d51e
2018-01-24 13:24:05 -08:00
9414072159 Add operators to support batch normalization across multiple devices on the same node
Summary: This is the first in a series of diffs to enable batch normalization across multiple devices on the same node with data parallel model. The diff contains the ops for computing the per-channel statistics required to obtain the mean and variance across multiple devices on the same node on the forward pass, and the gradient of the bias and scale during backpropagation. The actual modifications to SpatialBN and SpatialBNGradient to make use of these results will be in a separate diff.

Reviewed By: rbgirshick

Differential Revision: D6697336

fbshipit-source-id: 0de2750fe7e851795f238d9f625aeb4d74023dc2
2018-01-24 13:24:04 -08:00
7a232aae49 Add random seed to NGramFromCategorical test
Summary: TSIA

Reviewed By: Yangqing, Maratyszcza, dzhulgakov

Differential Revision: D6797213

fbshipit-source-id: e1132229cda09d1fbde63686aaec81b995989c03
2018-01-24 13:05:28 -08:00
2828c7a391 Moved RoIAlign to OSS.
Reviewed By: newstzpz

Differential Revision: D6775228

fbshipit-source-id: a9a6689fb5f6004f13ec03db8410fd81e2e6468e
2018-01-24 13:05:27 -08:00
09a1ef54ab Add missing cerrno include in text_file_reader_utils
Summary:
text_file_reader_utils.cc uses errno, but lacks #include <cerrno>
This causes build failure on Android NDK r16b

Reviewed By: Yangqing, pietern

Differential Revision: D6706978

fbshipit-source-id: 494b2b0aa7d74d8913bfcbd75015848f16eb9cdb
2018-01-24 12:11:38 -08:00
8400c57daa remove now-unnecessary check 2018-01-24 10:37:27 -08:00
0ae5498079 [JIT] add create_autodiff_subgraphs (#4822)
This pass splits differentiable subgraphs into their own Node,
similar to a fusion group.

This initial implementation does not create optimal subgraphs, but
it works well in the case where most things are differentiable,
and has the building blocks (`mergeNodes`) to extend to the
better implementation.
2018-01-23 23:46:54 -05:00
a14abc741e Heuristic-based autograd execution order (#4746)
* heap autograd order

* --accept JIT test
2018-01-23 23:45:33 -05:00
e979b7c940 Removed redundant import re (#4826) 2018-01-23 23:43:28 -05:00
5e72d7af13 Remove setting coalesce to 0 in sparse transpose_ (#4707)
* Remove setting coalesce to 0 in sparse transpose_

* Remove setting coalesced to 0 in THCSTensor transpose_

* Add test for transpose's coalesce invariant
2018-01-23 21:57:12 -05:00
bc11511cda Restore sparse variable transpose_() and t_() (#4779)
* Restore sparse variable transpose_() and t_()

* Add dimension wrapping to transpose_, t_

* Don't expose sparse_raw_resize_ to python
2018-01-23 21:32:40 -05:00
23dc8acbc8 Fix missing import and enable test for profiler on Windows (#4522)
* Fix missing import and enable test for profiler on Windows

* Skip process when excutable is not found
2018-01-23 21:30:42 -05:00
1e7d15953e Added Chi2 test for distributions (#4815) 2018-01-23 21:29:56 -05:00
82fed06535 disable qr_big cuda test on Windows (#4747) 2018-01-23 21:29:32 -05:00
e83546b686 Restore sparse variable _dimI() and _dimV() (#4785) 2018-01-23 21:13:03 -05:00
c7a2e318ed Restore cuda variable.bernoulli() (#4787) 2018-01-23 21:12:47 -05:00
29c7c682d8 add NGramFromCategorical Op
Summary: as titled

Differential Revision: D6783763

fbshipit-source-id: 78280cf15c2cdc3c308562d3f27a81b61ef8d662
2018-01-23 15:08:25 -08:00
27505e6429 Fix #4480 by tracing inputs before running function. (#4807)
* Fix #4480 by tracing inputs before running function.

The DCE trick says that if I have y = f(x), and f is internally implemented as
g, it's OK to trace both g and f. Recall the tracing algorithm is:

    enter f(x)
    compute its result y
    trace y = f(x)
    return from f

So when you run the example above, you'll do this:

    # suppose x is mapped to %1
    enter f(x)
    enter g(x)
    result of g is y
    trace y = g(x a.k.a. %1) (mapping y to %2)
    return from g
    result of f is y
    trace y = f(x a.k.a. %1) (remapping y to %3)
    return from f

and end up with a trace like this:

    %2 = g(%1)
    %3 = f(%1)

... only %3 is live, because %2 was killed from the mapping...  Subsequent DCE
will eliminate the invocation of g and you'll only see f in the final trace.

However, if f and g are inplace functions, the machinery breaks:

    # suppose x is mapped to %1
    enter f(x)
    enter g(x)
    result of g is x
    trace x = g(x a.k.a. %1) (remapping x to %2)
    return from g
    result of f is x
    trace x = f(x a.k.a. %2) (remapping x to %3)
    return from f
    resulting in:

    %2 = g(%1)
    %3 = f(%2) # OOPS

This commit changes the strategy so we instead do this:

    enter f(x)
    trace f(x)
    compute its result y
    trace y = f(x)  (computed above)
    return from f

Now we get the correct Value before it is overwritten.
Here is what the new trace code looks like:

    jit::tracer::PreTraceInfo trace_info;
    if (jit::tracer::isTracing( self, index )) {
      trace_info = jit::tracer::preRecordTrace( "index_fill", { self, index } );
      setattr(trace_info.n, jit::Symbol("dim"), dim);
      setattr(trace_info.n, jit::Symbol("value"), value);
    }
    baseType->index_fill_(self_, dim, index_, value);
    increment_version(self);
    rebase_history(self, grad_fn);
    if (trace_info.state != nullptr) {
      jit::tracer::postRecordTrace( trace_info,  { self } );
    }

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Revert "Hot patch ONNX _run_symbolic_function"

This reverts commit d1c973fee1a20da86d60d526e253ce89f5840baf.

* lintfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add missing expect file

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-23 18:06:55 -05:00
0e9b0cf779 add error msg in fc input_record
Summary: as titled

Reviewed By: xianjiec

Differential Revision: D6787879

fbshipit-source-id: 4bbdd11455480b25fa18121fa4527a9f0a03addc
2018-01-23 14:48:15 -08:00
691c38d670 Remove windows linebreaks in various distributions files. (#4817) 2018-01-23 17:15:59 -05:00
0aa1a6387e Add a seed to the gru unit test
Summary:
as it calls np.random and sometimes fails unreproducibly
Closes https://github.com/caffe2/caffe2/pull/1779

Reviewed By: pietern

Differential Revision: D6779802

Pulled By: anderspapitto

fbshipit-source-id: 2ad069f8a15f70a8110b1a6bdb06f81577c53ad4
2018-01-23 13:47:43 -08:00
24babe249f single_label_weighted_sampling
Summary: When using sample weights to do weighted sampling in everstore loader, the proto size is increased by one. Update image_input_op to support this new use case

Reviewed By: chenlifei

Differential Revision: D6776709

fbshipit-source-id: 6148908881ad019b6b621413f452ea1814573a00
2018-01-23 13:02:50 -08:00
9bb6d33d35 Enable scalars if compiled with WITH_SCALAR environment variable. (#4806)
* Enable scalars if compiled with WITH_SCALAR environment variable.

We are pretty close to enabling scalars (0-dimensional arrays); this allows turning them on
for development purposes and to be able to write code that works both with and without scalars enabled.

WITH_SCALARS is currently broken with distributions, but should work for test_torch, test_autograd, test_nn.

* Fix unsqueeze.

* Fix wrap dim, wrapping with Scalar.
2018-01-23 15:44:11 -05:00
e60f7e2490 Create issue template with guidelines for issue submissions (#4810) 2018-01-23 15:00:40 -05:00
a7ef4e4d46 Use android.cmake.toolchain from Android NDK
Summary:
The android.cmake.toolchain file we use from a submodule is unmaintained and not updated since 2015.
It causes numerous problems in Caffe2 build:
- Caffe2 can't be built for Android ARM64, because gcc toolchain for ARM64 doesn't support NEON-FP16 intrinsics, and the android.cmake.toolchain we use doesn't allow us specify clang-5.0 from NDK r15c
- Caffe2 can't be built with Android NDK r16 (the most recent NDK version)
- Caffe2 can't be built for Android with Ninja generator

This change updates the build script to use $ANDROID/build/cmake/android.cmake.toolchain instead, which is maintained by Android team, and synchronized with Android NDK version.
As this toolchain file doesn't support "armeabi-v7a with NEON FP16" ABI, I had to disable mobile OpenGL backend, which requires NEON-FP16 extension to build. With some work, it can be re-enabled in the future.
Closes https://github.com/caffe2/caffe2/pull/1740

Differential Revision: D6707099

Pulled By: Maratyszcza

fbshipit-source-id: 8488594c4225deed0323c1e54c8d71c804b328df
2018-01-23 11:32:41 -08:00
91e2e67c8b Add fallbacks for ChannelShuffle and Transpose
Summary:
Tried implementing ChannelShuffle but couldn't get it to achieve good perf.
Even with 2 groups, it takes a pretty big perf hit (D6708051).

For transpose,
https://software.intel.com/en-us/mkl-developer-reference-c-deep-neural-network-functions
mentions it's supported via conversion. Looking into dnnConversionCreate
though
(https://software.intel.com/en-us/mkl-developer-reference-c-dnnconversioncreate)
it looks like from_layout and to_layout are expected to have the same dims and
can only differ in strides. So not sure how to go about implementing this.

Differential Revision: D6754439

fbshipit-source-id: 8e3c005818be30457eac46b70867f1b52d7ed1a6
2018-01-23 11:04:07 -08:00
3a8bf0d3dc Fix crash in MKLSumOp due to layout mismatch
Summary:
MKLSumOp assumes that all inputs will have the same layout, but this needn't be
the case as different inputs are typically created by different primitives and
some of them might have a custom layout. Create a View() before executing
dnnSumCreate().

Differential Revision: D6753233

fbshipit-source-id: 62420b972898066157c9c841275ccc917b3dec59
2018-01-23 11:04:06 -08:00
76a141f016 add error msg in get_key
Summary: as title

Differential Revision: D6782896

fbshipit-source-id: bd29f6d085e56f51deb4bf6ad81771787fd85a5a
2018-01-23 11:04:05 -08:00
70f0436335 add Elman RNN export to ONNX (#4613) 2018-01-23 13:56:11 -05:00
2dd79eb53a Visualize distribution of activation functions
Summary:
This is a  first attempt at completing bootcamp task T24449916. This diff contains 3 major changes:
1) Change LayerModelHelper to allow for exposing the output and parameters of any layer to metrics
2) Added a runner that allows metrics to draw arbitrary plots to a matplotlib axes object
3) Implement a metric that aggregates distributions of values in a blob over the training, and try this out in a notebook

Reviewed By: kennyhorror

Differential Revision: D6671273

fbshipit-source-id: b8961837395e89c957edbf5c7c862bdb845ccf4b
2018-01-23 10:36:40 -08:00
5403f3bc17 Temporary fix for Issue 4752 (#4760)
* Temporary fix for half embedding.

* Call data<Half> from data<__half>
2018-01-23 13:34:09 -05:00
c6a64f1a78 Better unsqueeze_to 2018-01-23 18:02:02 +01:00
e37f02469d Favor Variables over Tensors for scalar constructors in torch.distrib… (#4791)
* Favor Variables over Tensors for scalar constructors in torch.distributions.

Current behvior:
1) distribution constructors containing only python number elements will have their python numbers upcasted to Tensors.
2) Python number arguments of distribution constructors that also contain tensors and variables will be upcasted
to the first tensor/variable type.

This PR changes the above to favor Variables as follows:
1) The python numbers will now be upcasted to Variables
2) An error will be raised if the first tensor/variable type is not a Variable.

This is done in preparation for the introduction of Scalars (0-dimensional tensors), which are only available on the Variable API.
Note that we are (separately) merging Variable and Tensor, so this PR should have no real long-term effect.

Also note that the above means we don't change the behavior of constructors without python number arguments.

* Fix tests that require numpy.
2018-01-23 11:49:15 -05:00
c2afd590ae parallelize elementwise operation with openmp (#2764)
* parallelize discontiguous tensors' basic operations

* add comments

* remove unnecessary header file

* remove trailing whitespace

* resolve omp parallel for error(need for statement directly) in windows
2018-01-23 11:35:11 -05:00
8c69eacde6 Initialize cuda before setting cuda tensor types as default 2018-01-23 11:06:22 +01:00
154038e318 Removing NCCL clear_group_cache workaround with one more check in new_group (#4766) 2018-01-23 11:03:52 +01:00
8e0177255e Test for PositionWeighted
Summary: add Test for SparseLookup with PositionWeighted.

Reviewed By: kennyhorror

Differential Revision: D6771612

fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9
2018-01-22 19:20:46 -08:00
231d6f7b09 Add SqueezeOp in MKLDNN
Summary:
SqueezeOp support to drop drop dims of size 1. MKLMemory now supports Reshape()
if the buffer is in plain layout, in which case just the dims and layouts are
modified similar to caffe2::Tensor. SqueezeOp takes care of converting the
input to plain layout if needed via an intermediate buffer before calling
Reshape().

Differential Revision: D6735656

fbshipit-source-id: 953309498370e1b8986e8c593bc6963f38036255
2018-01-22 18:39:42 -08:00
409b1c8319 Improve wording of Sequential docs (#4790) 2018-01-22 21:18:23 -05:00
966db35dd9 Improve memory access patterns for index operations. (#4493)
Currently, index operation kernels work in "source/destination index-major
order".  (E.g., if thread count equals slice size, each thread will process
slice #0 in lockstep, and then slice #1, and so on.)

However, when elements inside each "slice" is separated by large strides (e.g.,
selecting columns of a matrix), it is better to switch to "elementInSlice-major
order".  For example, each thread can process element #0 of every slice, and
then element #1 of every slice, and so on.
2018-01-22 20:47:18 -05:00
c49f0279a6 Add kwarg-only 'requires_grad' parameter to Variable factories. (#4748)
* Add kwarg-only 'requires_grad' parameter to Variable factories.

Functions that create variables, e.g. torch.ones_like currently always return Variables with requires_grad=False;
this is less convenient than the existing Variable constructor that has a requires_grad parameter.  This commit
adds the parameter at the python binding level.

* Fix flake8.

* Address review comments.

* Match set_requires_grad implementation with tensor_new version.
2018-01-22 19:15:11 -05:00
9390f7d3d6 Implement a (data-only) Variable factory (#4753)
* Implement a (data-only) Variable factory.

Implements a function, torch.autograd.variable that is modeled after np.array.  The main difference between it and new() and
the tensor constructors is it inteprets a python number as data, i.e. as a 0-dimensional tensor (we currently don't expose
that at the pytorchl level, so it will temporarily end up as a 1-dimensional tensor), rather than a size.

The main difference currently between torch.autograd.variable and np.array is that np.autograd.variable is stricter, e.g.
passing a PyFloat when an integral type is the default tensor type will result in an array; np.array basically lets anything
through (floating-point / integral mismatch, overflow, etc).  This is to keep it consistent with Variable.new when called with
a sequence, although we can loosen the checks later.

This will be renamed to torch.tensor once we merge Variable and tensor.

* Address review comments.
2018-01-22 18:14:22 -05:00
e64ad91365 Revert "Add doxygen and graphviz to Jenkins docker base."
Summary:
This reverts commit 417f1bab18b1721db5edc7ac8abaf883c1f7d3ee.

No longer needed since we'll add this within the Jenkins job itself.
Closes https://github.com/caffe2/caffe2/pull/1777

Reviewed By: pietern

Differential Revision: D6778185

Pulled By: orionr

fbshipit-source-id: d66befa76e84f83cf41eea50e54bc610db03ddd0
2018-01-22 15:00:09 -08:00
1d4e996b87 Separate parameter downloading tasks from training tasks and run them in a different group
Summary:
At the end of distributed training, trainer needs to download the parameters back from parameter servers for saving the model. Currently, this parameter downloading happens at the end of job's epoch task group, which creates several problems when checkpointing is enabled for distributed training:

1. When checkpointing is enabled, we run multiple training epochs. At the end of each epoch, the model download tasks will run to collect parameters, but we won't save the model until the true end of training, so there is a big waste of resource.
2. After trainer0 downloads the parameters, these parameters take a lot of memory, so trainer0 can easily run out of memory in the next epoch of training.

Our solution is to insert a parameter download task group between the job's training epoch_group and the job's exit_group.

Reviewed By: azzolini

Differential Revision: D6765393

fbshipit-source-id: 5a4f556fc3c1cd7834a7c406a3c0de3fccd50c49
2018-01-22 14:04:12 -08:00
27f4041738 Checking performance flags during init.
Summary:
Adds 2 features:
(1) In cmake, allow the use of -march=native
(2) During initialization, check if Caffe2 is built with matching cpu
features of the current machine.

This helps us guarding performance claims in case the Caffe2 baseline is
built with limited computation capability.

Currently only added avx, avx2 and fma which are common.
Closes https://github.com/caffe2/caffe2/pull/1775

Reviewed By: ezyang

Differential Revision: D6772059

Pulled By: Yangqing

fbshipit-source-id: 884a3d7c7a71ed9631b7c6269ae95d842a09e1bd
2018-01-22 14:04:11 -08:00
a82b3096ef OSError will be raised in setup.py if "git" is not installed
Summary: Closes https://github.com/caffe2/caffe2/pull/1771

Reviewed By: pietern

Differential Revision: D6777503

Pulled By: bddppq

fbshipit-source-id: 7ef66c1bdd6a1c410c3938566d5e8979e3bb5b12
2018-01-22 14:04:10 -08:00
876bcc06b9 Fix squeeze() backward in edge case (#4783)
* Fix squeeze() backward in edge case

* Address comments
2018-01-22 16:36:02 -05:00
1569797b15 Use ATen infer_size implementation rather than TH. (#4781)
* Use ATen infer_size implementation rather than TH.

The only substantitive difference between the two implementations is in how empty sizes are handled;
in ATen these are treated as scalars (i.e., can be expanded to anything), whereas in TH they are treated
as a special case of empty tensors (i.e., can't be expanded to anything).  Therefore, this change is
necessary to support scalars (0-dimensional tensors).  We could also take a bool parameter for determining
how we treat empty tensors but this seems unnecessary: if one tries to expand an empty tensors (as a result
of an infer_size calculation), the expansion will fail.

* Make changes for review.

* Attempt to fix windows build.

* long -> int.
2018-01-22 15:34:31 -05:00
db45dbbebf Update README.md 2018-01-22 15:16:48 -05:00
14033df3cb Fix resize_as_ on Variables containing SparseTensors (#4745)
Fix resize_as_ on Variables containing SparseTensors

Also enable Tensor::tensor(...) on sparse types
2018-01-22 14:33:42 -05:00
b7752efc1b Restore sparse variable methods for: (#4780)
- _nnz
- coalesce
- to_dense
- is_coalesced
2018-01-22 13:48:51 -05:00
d618c05174 Increase lower bound of values for values in div test
Summary:
This should translate to an 1% error margin. The gradient checker uses a .5% threshold.
Closes https://github.com/caffe2/caffe2/pull/1766

Differential Revision: D6774077

Pulled By: pietern

fbshipit-source-id: f97c7ffb2ef34fdd71d69320a7fdcf4a6a457715
2018-01-22 09:06:12 -08:00
a5440717ae Restores some sparse variable methods (#4687)
* Restores some sparse variable methods:
- transpose
- t
- zeros
- zeros_like
- sub
- sub_
- div
- div_
- mul
- mul_

* Restore sparse variable pow()
2018-01-22 10:24:39 -05:00
ad2edd8613 Check submodules only in build_deps (#4770) 2018-01-21 20:24:05 -08:00
b5d513b1f9 Add op in MKLDNN
Summary:
Just redirects to MKLSumOp. Doesn't support broadcast though since dnnSumCreate
expects identical dims.

Differential Revision: D6729788

fbshipit-source-id: 3e189465ad9d026bec4954648562ffe4e67fc393
2018-01-21 08:21:43 -08:00
dd5c195646 More documentation for CUDA stream functions. (#4756) 2018-01-21 12:58:51 +01:00
f033dd60cd Implementation of the Fisher-Snedecor Distribution (#4706) 2018-01-20 21:49:09 +01:00
8593c6f4f7 Adding better KL-Tests (#4739) 2018-01-20 21:47:11 +01:00
816d5d8ff7 Scaffolding for source-to-source AD in the JIT 2018-01-20 17:34:08 +01:00
85126ba217 Semi-automatically generate scripts out of our tutorials
Summary:
The idea is the following. We are going to automatically generate .py files using a jupyter post-save hook. Also, there is a script to generate these for all the tutorials. The script is also used from Jenkins test.sh. So if you don't run the sync anyhow, test will complain.

In this diff I include the framework itself + .py files generated for all tutorials. They live under a separate folder.
Closes https://github.com/caffe2/caffe2/pull/1762

Differential Revision: D6749358

Pulled By: salexspb

fbshipit-source-id: d6ad28e863a0670af2d1e5af86e16909dc0dcf2c
2018-01-19 22:36:47 -08:00
91066559a8 truthy check for empty string in NameScope()
Summary:
As in name. LATTE translation team moving some code from Python 2 to 3 uncovered a case where comparison between unicode and str types leads NameScope('') to prepend a separator to the beginning of blob names. This fixes it.

Thank you so much to dzhulgakov for tracking down the cause of this so quickly!

Reviewed By: dzhulgakov

Differential Revision: D6766866

fbshipit-source-id: fbe46cff581f425ba10e8668400915ea40baab94
2018-01-19 21:34:09 -08:00
4ce4bc5c7f Fix occasional test timeouts
Summary: Make test less computationally expensive

Reviewed By: Yangqing, dzhulgakov

Differential Revision: D6766236

fbshipit-source-id: 59e51faa1331d804b11da9f7237ee9ce0cb27df8
2018-01-19 20:08:58 -08:00
96ceb91384 Add cudnn_is_acceptable function. (#4749)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-19 22:48:30 -05:00
d38cf0e1e9 Allow assertEqual checks with mixed Tensors, Variables, numbers. (#4754)
Currently, a Variable can only be compared with a Variable, but a Tensor
can be compared with Tensors or numbers.  Relax this constraint so Variables
behave identically to Tensors.
2018-01-19 22:28:37 -05:00
1fee7cd626 Delete some dead expand code. (#4755) 2018-01-19 22:27:17 -05:00
ced2c7e2b2 Remove Set/GetDefaultGPUID and move to use current gpu id instead.
Summary:
Reason for this change:

(1) Setting/Getting default gpu id doesn't seem to be used at all.
(2) It actually is confusing compared to the CUDA_VISIBLE_DEVICES options etc.
(3) When setting cuda_gpu_id=-1 in the CUDAContext arg, it used to use the
default gpu id but probably we should use the current gpu - so that the caller
will be able to control the device placement.

One use case is for TensorRT - if we have a custom callback layer, then it would
be easier for TRT or whatever caller to set the running device.

Reviewed By: dzhulgakov

Differential Revision: D6740357

fbshipit-source-id: 2ea710e434b10220d5a198e31c93847304636863
2018-01-19 18:03:21 -08:00
69ce46a20b Moved mask-rcnn inference operators to open source caffe2.
Summary:
- Moved mask-rcnn inference operators to open source caffe2.
- Registered GeneratedProposalsOp as GenerateProposals in addition to GenerateProposalsCPP.

Reviewed By: rbgirshick

Differential Revision: D6747190

fbshipit-source-id: be98d6b56b5b53b13af46e839f5ceaf27f7fddc3
2018-01-19 16:20:14 -08:00
cded9683ad Implement fused 8bit rowwise sparse lengths reductions
Summary: Building on D6710785 (float <-> fused_8bit_rowwise conversions) and D6710843 (`FusedEmbeddingLookup`), this diff implements the new reduction operations for the fused 8-bit rowwise storage. I mostly followed the [old 8-bit quantized code](diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/lengths_reducer_rowwise_8bit_ops.h) and [full-precision code](diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/lengths_reducer_ops.h).

Reviewed By: kennyhorror

Differential Revision: D6710844

fbshipit-source-id: b9e85db7437bd32dd44d01733c3749f35c00b06e
2018-01-19 15:44:35 -08:00
d401c26d63 Add FusedEmbeddingLookup
Summary:
Updates the perfkernel codebase to implement embedding lookup for our new fused storage format, where each row in the data matrix stores the quantized values *and* the scale and bias.

msmelyan see this as my best-effort attempt at updating the perfkernel stuff for the fused storage. Let me know if any of this is grossly wrong. I also don't know if we need to update any of the prefetching operations or something like that.

Note that we have to keep the old code around for a bit until we get rid of the old operations with separate `scale_bias` storage.

Reviewed By: kennyhorror

Differential Revision: D6710843

fbshipit-source-id: b485ef2389f526c5db1260cac9d4be3fc8df0979
2018-01-19 15:44:34 -08:00
8dc0702af5 Add float32 <-> fused_rowwise_8bit conversion Caffe2 operators
Summary: This first diff adds the conversion operators that go from float to our fused 8bit rowwise quantized storage and back again. For now I've put the scale and bias in front of each row because it makes the pointer arithmetic nicer here and in the EmebddingLookup perfkernel. If benchmarks or other reasons point out that this is a bad idea we can change it easily.

Reviewed By: kennyhorror

Differential Revision: D6710785

fbshipit-source-id: 086ab91c12d3b472564a06eff6329be6cb9e680e
2018-01-19 15:44:33 -08:00
e9dceec2c8 Fix the Macro definiton for E in cpuid.h; #undef E
Summary: Changed #undef C to #undef E after the definition of Macro E in cpuid.h

Reviewed By: ot, luciang

Differential Revision: D6763664

fbshipit-source-id: beb221f0c690b5450c39577dd0a843613d802e9c
2018-01-19 15:44:32 -08:00
bf45811266 Add doxygen and graphviz to Jenkins docker base.
Summary:
* This will let us generate documentation on the Jenkins workers.
Closes https://github.com/caffe2/caffe2/pull/1772

Reviewed By: ezyang

Differential Revision: D6762731

Pulled By: orionr

fbshipit-source-id: 2e170d13055429971fc2cce66512480825030572
2018-01-19 15:05:45 -08:00
f72d86e0d3 Implement geometric distribution (#4708) 2018-01-19 21:45:14 +01:00
a0b7169b7e Ensure that Tensors always have Storages (#4744) 2018-01-19 13:26:22 -05:00
c052eb6bbb update the video input op in caffe2
Summary:
This is to update the video input op in caffe2 so that it is up to date.
It adds additional support for:
1, optical flow and early fusion
2, different ways of sampling clips from video
3, different ways of resizing the input video

Reviewed By: dutran

Differential Revision: D6752788

fbshipit-source-id: 0cbd4d4bbbe97b0ada4cba7a55adc91a7af60d5f
2018-01-19 09:52:25 -08:00
4ea6e6a556 testSparseLookup
Summary: add basic test for SparseLookup

Reviewed By: kennyhorror

Differential Revision: D6749915

fbshipit-source-id: f97af785e4f89f36788a992843066fd1ec2b75a9
2018-01-19 09:27:20 -08:00
870ef8e95f Implement record_stream on Variable (#4728)
The function record_stream is currently only defined on Tensor in
TensorCuda.cwrap. It would be best to implement this in ATen and
automatically bind it to Python, but we're missing ATen types to
represent CUDA streams.
2018-01-19 10:58:13 -05:00
b6eb7d7ba0 Allow Python Variables to be bound to at::Tensor in pybind11 converter (#4730)
This allows _broadcast and _broadcast_coalesced to be called on
Variables. It also broadens the types accepted by some JIT methods.
2018-01-19 10:57:43 -05:00
f1c616418d Fix Python docs for broadcast and braodcast_coalesced (#4727) 2018-01-19 10:57:20 -05:00
e23acb3b08 Allow Variables in the (legacy) THNN bindings. (#4723)
The legacy NN bindings currently operate only on Tensors. We are slowly
replacing all uses of Tensor with Variable in Python code so that there
will only be one user-visible class. This changes the NN bindings
accessed through type2backend to accept either Tensors or Variables.

This does not affect the NN bindings that go through ATen.
2018-01-19 10:56:58 -05:00
b984c0b6e9 Various testing and utility improvements including torch.testing module. (#4726)
* Various testing and utility improvements including torch.testing module.

1) Remove method definition for randn_like since ones_like, zeros_like do not have methods.
2) Add an empty_like native function for creating a tensor with uninitialized values.
3) Add an is_floating_point() native function, similar to is_signed().
4) Add a torch.testing module loosely modeled after numpy.testing; currently it contains
   make_non_contiguous (moved from test_autograd) and randn_like (wrapper around the VariableFunction).
5) Remove code from test_autograd and test_nn that is responsible for generating grad_outputs to use
   with gradgradcheck.  These now use gradgradcheck's own generating code.  This fixes
   test_nn.py with scalars because gradgradcheck does the right thing here already.

* Rename parameter.

* Fix parameter usages.
2018-01-19 10:54:41 -05:00
db6be0e1f1 Fix call to THPUtils_parseSlice (#4732)
* Fix call to THPUtils_parseSlice

THPUtils_parseSlice returns a bool

* Add Variable.__index__

* Add test
2018-01-19 09:39:26 -05:00
b997474a4f Adds Im2Col and Col2Im (#4729) 2018-01-19 09:37:53 -05:00
f7ab0cb56c Legacy Padding: correct output size with nInputDim 2018-01-19 12:45:30 +01:00
d29670db46 Make tensor cast constructor explicit
Summary:
Fixes a beautiful bug spotted by mschatz: MetaStr was super slow for TensorCUDA because it was defined for CPU tensors only. And thus C++ friendly was invoking the casting costructor which copied the entire buffer to CPU!

I think both copy constructor and cast constructor should be explicit for Tensor given that it's an expensive op. There might be more spots to fix in the code.

Original revision with MetaStr bug is 2d026cfe9c :)

Reviewed By: Yangqing

Differential Revision: D6758540

fbshipit-source-id: 7d2dffadd84c043908e16927fe02e6ffb01f750c
2018-01-19 01:39:16 -08:00
92aeca1279 update runtime dockerfile (#4736) 2018-01-18 22:07:25 -05:00
f9fd82d893 Type fix fused/mix precision (#4734) 2018-01-18 22:05:19 -05:00
b28d5a3586 Build doxygen docs with cmake and fix catalog generation
Summary:
This updates https://github.com/caffe2/caffe2/pull/1096/ to build doxygen docs with cmake and fixes operator catalog generation. See the new README.md for details, but you can run

```
mkdir build && cd build
cmake -DBUILD_DOCS=ON .. && make
```
and

```
python caffe2/python/docs/github.py ~/c2docs/_docs/operators-catalogue.md
```

to generate docs.

There was one weird issue in `generator.py` that we sometimes receive tuples and sometimes objects. I handled this just by testing `isinstance`, but we might want to be more principled in the future.
Closes https://github.com/caffe2/caffe2/pull/1758

Reviewed By: pietern

Differential Revision: D6752127

Pulled By: orionr

fbshipit-source-id: 9ba9ad8efc920b27a57327f8a7d3050f3650d4ce
2018-01-18 18:47:59 -08:00
a9a2b9ee3e Adding a separate script for anaconda builds
Summary:
Lots of unwanted stuff here that shouldn't be in this branch. I just need to make a PR so I can test it
Closes https://github.com/caffe2/caffe2/pull/1765

Reviewed By: orionr

Differential Revision: D6752610

Pulled By: pjh5

fbshipit-source-id: cc93290773640a9eb029f350b17f520ac5f2504e
2018-01-18 16:03:45 -08:00
e855317370 Make dirichlet_grad and standard_gamma match ATen declarations (#4722)
The Python function has an underscore (_) prefix so the C++
IMPLEMENT_STATELESS call should have an underscore prefix as well.
2018-01-18 16:49:18 -05:00
93f49667d0 Allow Variables in calls to NCCL bindings. (#4725)
The Tensor and Variable classes are being merged in Python. This means
that all interfaces to C++ must accept Variables where they previously
accepted Tensors.
2018-01-18 15:25:41 -05:00
e3e6680b48 Add ElmanCell and ElmanRNN
Summary: Closes https://github.com/caffe2/caffe2/pull/1742

Reviewed By: dzhulgakov

Differential Revision: D6706809

Pulled By: anderspapitto

fbshipit-source-id: 15a05786a26aeb719ea4377f4dbbb62738d9e697
2018-01-18 12:14:02 -08:00
3249d8bf89 Allow Variables in calls to type2backend (#4724)
Use x.type() instead of type(x) when accessing type2backend to support
Variables as well as Tensors.
2018-01-18 15:01:38 -05:00
158e001238 Checking for positive epoch size before running epoch
Summary: Checking for positive epoch size before running epoch

Reviewed By: pietern

Differential Revision: D6738966

fbshipit-source-id: 64e1fb461d784786b20a316999e4c037787f3a14
2018-01-18 11:48:35 -08:00
8e4f67ed72 Enable the detectron module in cmake
Summary: Closes https://github.com/caffe2/caffe2/pull/1761

Reviewed By: pietern

Differential Revision: D6749288

Pulled By: rbgirshick

fbshipit-source-id: cfdd2a6c9fe30b7e8f24b2e83e4bb0191d1893a0
2018-01-18 10:21:22 -08:00
23fc2b7e06 Define CHECK in torch/csrc/cuda/nccl.h (#4721)
The CHECK function was used but not defined in the nccl.h header file.
2018-01-18 13:08:06 -05:00
f072986733 adds reduce argument to BCEWithLogitsLoss interface (#4705)
* adds reduce arg to BCEWithLogitsLoss interface

Adds the missing 'reduce' argument for the BCEWithLogitsLoss module
so that it matches the functional interface.

* fix indentation and add additional test

fixes the indentation used to update the BCEWithLogitsLoss module
and adds a unittest to sanity check its usage with `reduce=False`
2018-01-18 10:54:18 -05:00
79d15c52cb Improve the engine support for functional graph execution (#4690)
Previously the side-effect free grad calculation was performed
using callbacks that could also override the decision to run a
function. However this had a few problems e.g. it forced us to iterate
over pretty much all functions in the graph and drop their buffers.

This patch improves the mechanism, by adding explicit support for this
kind of evaluation in execute(). It's safer, and the algorithm used to
decide which nodes have to be evaluated was replaced with a faster one.
2018-01-18 11:20:30 +01:00
d1c4065f0d Support copy on sparse tensors in at::Type 2018-01-18 11:16:45 +01:00
1061d7970d Move broadcast and broadcast_coalesced to C++ 2018-01-18 11:16:45 +01:00
de5f7b725e Base for pure C++ NCCL interface 2018-01-18 11:16:45 +01:00
2da43bf6f1 Make Symbol a true struct (#4717)
Previous Symbol was just a uint32_t and we converts symbolToString and
stringToSymbol. Now Symbol is a struct with a toString method, and
constructors from either BuiltinSymbols enums (e.g. kParam) or strings.

Symbol is convertible to a uint32_t to ensure it can still be used in
switch statement BuiltinSymbol case branches.
2018-01-17 21:49:28 -08:00
d7e7e794f5 Fix display of test failure number in test_distributions. (#4713)
* Fix display of test failure number in test_distributions.

Previously, if e.g. the last example of 3 failed, it would say example 2/3.

* Fix other instances of enumerate pattern.
2018-01-17 20:57:34 -05:00
57549b7e44 Bind functions with out= arguments in VariableType (#4565)
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.

The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
2018-01-17 18:27:42 -05:00
6f0bb28afb Stop running RowWiseSparseAdam test on GPU
Reviewed By: pietern

Differential Revision: D6739194

fbshipit-source-id: 0892cdc6a575a84147f86984c67e7b4bf605a197
2018-01-17 15:05:21 -08:00
a8bdce38fe Replace PowConstant (#4711) 2018-01-17 17:30:56 -05:00
720c7b1e2c Move repeat to torch/_utils.py (#4712)
This moves the implementation of repeat to _utils so that the autograd
function can call it directly instead of relying on forward being called
on tensors.

This also removes _range, which was previously necessary because we
shadowed the built-in range() function.
2018-01-17 17:30:43 -05:00
b37aa2bf0e Ensure lazy evaluation for probs and logits (#4691) 2018-01-17 22:36:40 +01:00
539b1ed4b9 Add proper scalar checks to functions bound by nn.yaml. (#4696)
* Add proper scalar checks to functions bound by nn.yaml.

By default, the forward functions use the default ATen scalar checks and the backward functions
use x_->isScalar() for grad_x (with grad_input mapping to self).

These can also be overridden by specifying a dict of arg_name -> scalar_check.

If the argument is not overridden and the default mapping cannot work (because x for grad_x is not
passed to the backward), an error is raised and the scalar_check must be explicitly specified.

* Fix scalar checks for loss functions with a reduce parameter.
2018-01-17 16:17:04 -05:00
1a02d3ae86 Implement MM fusion (MM with add reduction tree) (#4615)
Implement MM fusion (MM with add reduction tree)

A tree where leaves are matrix multiplies and inner
vertices are adds can be computed as a single mm.
Such subgraph often appear in backward if a single weight
is reused multiple times (e.g. in RNNs).

NOTE: this seems to be slightly slower on the GPU than the
naive implementation, but it's a huge win on the CPU
(think 100x lower overhead)
2018-01-17 21:36:21 +01:00
db7f5dae77 Test_autograd support for 0-dim input/outputs. (#4647)
* Test_autograd support for 0-dim input/outputs.

This uses the 'fake' _scalar_sum function to test scalar (0-dimensional) inputs and output in test_autograd.

Main changes:
1) Introduces a randn_like function (this is really just for convience but it comes up often in testing.
2) Because the Tensor and Variable API are different wrt sizes, we take care to not exit the Variable API when
constructing Variables based on other Variables.  This is pretty straightforward, but there is sometimes an extra
line of code for setting requires_grad.  Should we have the 'like' functions maintain requires_grad?  Or bind all
factory functions with an additional 'requires_grad' parameter?

* Fix flake8.

* Get rid of _scalar_sum tests.

* Use zeros_like instead of more complicated constructs.

Also remove _scalar_sum native function / derivative definitions.
2018-01-17 13:55:10 -05:00
d6423d9895 Import Detectron ops 2018-01-17 10:31:30 -08:00
8c2d35c754 Refactor distributions (#4688) 2018-01-17 11:58:08 +01:00
61356cbadc RowWiseSparseAdam operator
Summary: Added the RowWise functionality for SparseAdam, which saves roughly 2/3 memory usage by only keeping one first and second moment term for each row of the parameter tensor, rather than one for each individual parameter.

Differential Revision: D6679342

fbshipit-source-id: ce6fb27e35ce41a890c66f6089cd2748d10e7a44
2018-01-16 19:39:31 -08:00
05ebd15207 Fix cuDNN batch norm overload in VariableType for half precision (#4693)
cuDNN batch norm uses mixed half/float precision in batch norm. This
changes the overload to only check that the arguments are of
VariableType and does not check their concrete type (scalar/backend).
2018-01-16 18:19:23 -05:00
d6b48c1571 [ASAN] fix more load_real deletes (#4694) 2018-01-16 18:07:58 -05:00
6ba96952a6 Fix Eigen failure with conda build conda on Mac.
Summary:
* I saw this fail on my Mac, but it's a general problem. More details at https://github.com/ryanrhymes/eigen/issues/2
Closes https://github.com/caffe2/caffe2/pull/1756

Reviewed By: pietern

Differential Revision: D6729609

Pulled By: orionr

fbshipit-source-id: e03cce1d6a6b68b131bae9b84d28636c06f85615
2018-01-16 14:38:47 -08:00
cb83474a57 Fix embedding with sparse=True (#4686)
Fixes #4666
2018-01-16 16:19:20 -05:00
559380c9ea Install CMake 3.6.3 in base image for Android build
Summary:
This is needed for #1740.

Verified that `./build.sh py2-android-ubuntu16.04` builds an Android base image with CMake 3.6.3.
Closes https://github.com/caffe2/caffe2/pull/1747

Differential Revision: D6729823

Pulled By: pietern

fbshipit-source-id: f7c888b4fba14ff6ea703cc269175b327b49f6b8
2018-01-16 12:59:35 -08:00
3254eca8c8 Implement binomial distribution (#4658) 2018-01-16 21:39:05 +01:00
1fd05df738 Add no_prefetch option to prefetch_op.
Summary:
We may not want to run the operator in a prefetch manner if we don't need any prefetching.
The option allows without modification to any operator to run it ina normal fashion.

Differential Revision: D6717720

fbshipit-source-id: 10114d68edd95258b823603d8532360120421649
2018-01-16 11:07:50 -08:00
d452291a72 updated documentation for Embedding layer. Fixes #4682 (#4684) 2018-01-16 13:18:30 -05:00
ddb767f214 Add printing support for sparse variables (#4683) 2018-01-16 13:18:10 -05:00
d6ec05d0e3 doc: update installation.md for Ubuntu 14.04/16.04
Summary:
PR Description
----------------

This commit is to update how to install Caffe2 in Ubuntu distribution.
The existing instruction is written as installation guide for generic Ubuntu
distributions. Let's update the existing manual in more detail.

**Changes proposed in this PR:**
1. Added Ubuntu 14.04 section with existing contents.
2. Added Ubuntu 16.04 section

**Self evaluation:**
Tested (compilation in Ubuntu 16.04 x64 LTS)

Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
Closes https://github.com/caffe2/caffe2/pull/1723

Reviewed By: pietern

Differential Revision: D6692998

Pulled By: orionr

fbshipit-source-id: 8da9250ff27dbeb41f12364cdd531b2fb416c31f
2018-01-16 09:42:14 -08:00
97fc06ac22 Use restat to reduce ninja rebuilding when running codegen. (#4635)
* Use restat to reduce ninja rebuilding when running codegen.

Usually, you're only working on one codegen file at a time, but
in our old behavior, editing one would induce a rebuild of everything
that depended on ANY generated file.  We fix this in two steps:

- Don't write the file (updating the timestamp) when the contents
  are unchanged.  (I had to update three seperate places; shared
  Python library for build tools when?!)

- Use the 'restat' ninja feature to avoid rebuilding when the timestamp
  doesn't change.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* lintfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* lintfix2

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-16 12:32:22 -05:00
c4edb56b45 Print full type of Variable tensor
Previously, it printed [Variable]; now it prints [Variable CPUDoubleTensor].
I'm not altogether sure why toString on Variable returns the uninformative
thing, but that might be worth fixing too.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-16 12:24:55 -05:00
c3b7baecea Fix #4422, use grad for cudnn_batch_norm derivative / don't use toTensor()
This commit fixes double-backwards on batch norm.  There were two
bugs:

- Returned buffers from batchnorm backwards were being marked as differentiable
  when they shouldn't be.  The fix for this is "easy": use 'grad' instead of
  'grads[0]' in cudnn_batch_norm's backward definition.  (More on this below.)

- I was using toTensor on a Scalar, which gives me a Tensor of the wrong
  type when I'm in CUDA world.  Using the Scalar add() overload directly
  solves the problem.

The differentiability of returned buffers was annoyingly subtle and I nearly
went off and implemented a big pile of infrastructure to "tell" the codegen how
to distinguish between differentiable and non-differentiable outputs before
realizing that there must be a way we do this legitimately, because it works for
THNN.  I documented this in derivatives.yaml, and also added tests for the
problem in load_derivatives.py to catch the various ways you could "get it
wrong".  Hope this helps someone else.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-16 12:24:55 -05:00
7449b467d9 fix deallocation and accesses from ASAN detection (#4678) 2018-01-16 12:21:21 -05:00
9f893dda5f Add LocalResponseNorm to docs (#4681) 2018-01-16 11:12:50 -05:00
2260649fb6 Local Response Normalization (#4667)
* Local Response Normalization

* Add 1D and 3D LRN

* Generalise LRN to higher dims

* Use mean instead of sum

Specify 'across-channels'
2018-01-15 22:23:51 -05:00
522276759d GPU detection fails when CUDA compilation requires CUDA_HOST_COMPILER to be set (#4676) 2018-01-15 14:29:09 -05:00
67494cee9d Fix cast direction in THCBlas (#4670) 2018-01-15 11:15:33 -05:00
017893e21b Fix batch norm JIT dispatch 2018-01-14 23:37:44 +01:00
86fe793948 Addition of KL-Divergences for torch.distributions (#4638) 2018-01-14 22:52:28 +01:00
27d7182d6c replace full stop by comma
From (batch. hidden_size) to (batch, hidden_size)
2018-01-14 20:34:27 +01:00
bdb05c2243 Add tests for distribution .entropy() methods (#4657) 2018-01-14 13:56:38 +01:00
188ee3ff0b Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2 (#4656) 2018-01-14 13:10:41 +01:00
05908e8243 current code works with dim = 3, so I added it to dim checks 2018-01-13 12:58:08 +01:00
9b6441ecbc Implement Multinomial distribution (#4624) 2018-01-13 11:26:14 +01:00
cb7350fc8d Add vulkanSymbolWrapperReset function
Reviewed By: Maratyszcza

Differential Revision: D6707702

fbshipit-source-id: 140c4be7884a307953684a13202c668cb2c1a927
2018-01-12 21:18:06 -08:00
4db89e6890 Check for result in queue only after background process is terminated
Summary:
Gloo test was waiting only for 10sec for processes
to terminate causing tests to be flaky.

Reviewed By: pietern

Differential Revision: D6672990

fbshipit-source-id: c58ba512396a0e45fa6ea4d14534ab0ccd54f2a9
2018-01-12 18:06:47 -08:00
e79eea2c11 Use protoc RPATH to figure out its install prefix
Summary:
[x] Have to rebase
[x] Have to ensure this works on macOS + Anaconda
Closes https://github.com/caffe2/caffe2/pull/1741

Differential Revision: D6714172

Pulled By: pietern

fbshipit-source-id: 43a16d99a6ddf821a35b512c780cdfa35a721219
2018-01-12 17:39:11 -08:00
81898e5d47 Fix for wrong newline in caffe_translator.py (Crop layer translation)
Summary:
- fixed the false newline at the initialization of the crop layer translation which caused the exceptions described in issue #1215
Closes https://github.com/caffe2/caffe2/pull/1746

Differential Revision: D6716228

Pulled By: Yangqing

fbshipit-source-id: dd93b06b3b903f96505d6e6f8e67caeb6981fe66
2018-01-12 16:17:53 -08:00
db6777eaf4 fix gru_cell bug
Summary:
the fc needs to be in the output_gate_t scope so it can find its input
weights correctly
Closes https://github.com/caffe2/caffe2/pull/1739

Reviewed By: dzhulgakov

Differential Revision: D6705443

Pulled By: anderspapitto

fbshipit-source-id: 139e83ac77589a203ffe404fedab98eea5b1a51c
2018-01-12 15:34:23 -08:00
8eded5aece Fused fp16 lstm backward math fix. (#4611) 2018-01-12 23:11:05 +01:00
a3b098dcf9 Adding is process_group initialized support (#4618) 2018-01-12 22:56:54 +01:00
5343b71a62 More strict shape check on Conv operators. (#4637)
* More strict shape check on Conv operators.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Test case for conv's shape check.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Fix lint.

Signed-off-by: HE, Tao <sighingnow@gmail.com>
2018-01-12 15:32:45 -05:00
841ce42daf Fix flake8. (#4644) 2018-01-12 14:38:14 -05:00
8ef26185d6 Add missing torch declarations to derivatives.yaml. (#4617)
1) Zero-dim tensors to the fill functions that weren't bound (they couldn't be called successfully
because we haven't enabled scalars), and needed derivatives for their value arguments.

2) ne_ was missing a Scalar overload.
2018-01-12 14:28:17 -05:00
eb857ec367 Introduce a (non-public) autograd scalar method and improve printing (#4586)
* Specialize Variable pinting and always print device for GPU tensors/Variables.

* Introduce a (non-public) _scalar_sum() method for autograd scalar testing.
2018-01-12 14:26:38 -05:00
a14dd69be8 [ATen] Have any()/all() return a Tensor in preparation for dim/keepdim parameters. (#4639) 2018-01-12 14:25:21 -05:00
b7a0d0efb5 fix heap-use-after-free in THStorage.c 2018-01-12 11:11:41 -08:00
b42f163835 [ONNX] export sum, prod, sqrt improve log_softmax. (#4579)
* ONNX: export sum, prod, sqrt improve log_softmax and fix a typo in doc.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Add new exported op to doc.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Double quotes.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Update trace log of log_softmax.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Improve export when dim is None and axes_i should be a list of ints.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Fix prod when no dim given.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Update line ends in test expected file.

Signed-off-by: HE, Tao <sighingnow@gmail.com>
2018-01-12 07:44:56 -05:00
7e3da98734 Clean up error checking in THPTensor_(_convertToTensorIndexers) 2018-01-12 12:44:00 +01:00
736190fc78 Allow broadcasting of value x params in Categorical (#4614) 2018-01-12 12:16:19 +01:00
d5e0aa3d53 Remove gmock from batch_matmul_gpu_test
Summary: Remove gmock from batch_matmul_gpu_test

Reviewed By: pietern

Differential Revision: D6706787

fbshipit-source-id: 3dceac2b0097202c5a03d5bb472f77e0223ea1f1
2018-01-11 15:45:18 -08:00
b2964a92d9 Add MKLConcatOp
Summary:
MKLConcatOp along the channel dim of NCHW tensors. Spec:
https://software.intel.com/en-us/mkl-developer-reference-c-dnnconcatcreate

Reviewed By: ajtulloch

Differential Revision: D6689716

fbshipit-source-id: 492bc440474f8ce37caa85509789496659b03e79
2018-01-11 14:19:22 -08:00
dda33ca53a enable setting model initialization seed
Summary: This diff enables setting model initialization seed, instead of random seed, when reproducible restults are desired.

Reviewed By: xianjiec

Differential Revision: D6642971

fbshipit-source-id: 387b1ee2ecef4f8f66570c882498fb97d7007e17
2018-01-11 14:04:03 -08:00
9760329014 fix possible divide by zero 2018-01-11 13:32:27 -08:00
995eafec84 Remove gmock dependency
Summary: Remove gmock dependency

Reviewed By: pietern

Differential Revision: D6704808

fbshipit-source-id: 8067e382061ef00f9536a7588064bcbb73a598c2
2018-01-11 13:24:43 -08:00
1d0426a9f5 Prepare test_autograd.py for introduction of scalars (#4599)
* Distinguish between scalar tests and pyscalar tests.

* Distinguish between scalars and no arguments.

* Add NoArgsClass so NO_ARGS is iterable.

* Fix iterator specification in python3.

* Now fix for python 2.

* Fix flake8.
2018-01-11 16:23:41 -05:00
7b31d33e80 Fix use after free (#4559)
In `THPTensor_(_convertToTensorIndexers)`, a `vector<THPIndexTensor>` is
created by constructing `THPTensor`s from sequences/tensors/etc. Each
`THPIndexTensor` is then freed with the following:

```
for (auto& idx : indexers) {
  THIndexTensor_(free)(LIBRARY_STATE idx->cdata);
  Py_DECREF(idx);
}
```

This is a problem because `Py_DECREF(idx)` will turn `idx->ob_refcnt` to 0 since this function
created the relevant `THPIndexTensor`s and owns them, causing `THPTensor_(dealloc)` to be
called. `THPTensor_(dealloc)` already has a line that calls
`THIndexTensor_(free)(LIBRARY_STATE idx->cdata)`.

So `THIndexTensor_(free)(LIBRARY_STATE idx->cdata)` gets called twice on the same
`cdata`. After the first call frees `cdata`, the second attempts to access flags/members of `cdata` to
determine if it should free it.
2018-01-11 16:21:35 -05:00
4d62cf499c fix out-of-bounds access in THTensor.c caught by asan 2018-01-11 13:17:26 -08:00
77523df413 Add more check on softmax ONNX exporting logic (#4592)
* Add more check on softmax exporting logic

* Add more comments about axis and dim
2018-01-11 15:14:33 -05:00
4357dee097 Adapting conda build to work for ubuntu and adding a flag to control precedence of Anaconda include dirs
Summary:
This should fix Protobuf version problems on all Anaconda builds by putting include directories under Anaconda before all other include directories.
Closes https://github.com/caffe2/caffe2/pull/1728

Reviewed By: orionr

Differential Revision: D6698435

Pulled By: pjh5

fbshipit-source-id: f73f4a5ebb4ca91db14770a88a704ace69d37ba4
2018-01-11 12:01:04 -08:00
224493d9ce NNPACK: Use new bindings and custom thread pool
Summary:
This change should dramatically (~10X) improve performance of convolution with NNPACK engine
Closes https://github.com/caffe2/caffe2/pull/1730

Reviewed By: sf-wind

Differential Revision: D6695895

Pulled By: Maratyszcza

fbshipit-source-id: 26291916811ef4cb819a59aec848c4e23668e568
2018-01-11 10:48:12 -08:00
d3b6c5e556 Support output_padding in ConvTranspose while doing ONNX exporting (#4583) 2018-01-11 12:31:06 -05:00
2b2a7dc2ad small fix on MaxPool2d __repr__ (#4591) 2018-01-11 12:29:43 -05:00
71b1120ba8 Fix bug in Dirichlet.rsample(); add tests (#4602)
* Fix bug in Dirichlet.rsample(); add tests

* Address review comments
2018-01-11 12:29:10 -05:00
19a8a3fc35 updating gloo to latest master (#4608) 2018-01-11 12:28:49 -05:00
94f439c07c Fixed setup.py to handle CUDNN_LIBRARY envvar with aten (#4597)
* Fixed setup.py to handle CUDNN_LIBRARY envvar with aten

* undo changes

* Added CUDNN_LIBRARY to bat file
2018-01-11 07:24:17 -05:00
0988e328c9 Fix errors in travis config 2018-01-11 12:10:23 +01:00
059299b74d fix compile errors (#4600) 2018-01-10 21:50:44 -05:00
8cff8e93d2 Add torch.distributions.utils._finfo for numerical stability (#4572)
* Add torch.distributions.utils.finfo

* Make _finfo private

* Address review comments

* Simplify _finfo() to key on Storage type
2018-01-10 21:42:47 -05:00
c1d5e71e7c removing Local.cwrap entry in build_libs (#4595) 2018-01-10 21:26:23 -05:00
868e77a3d2 Ignore clang compilation database in git (#4601) 2018-01-10 21:26:02 -05:00
0a8a18ca01 Fix GemmBatched
Summary: Fix GemmBatched

Reviewed By: Yangqing

Differential Revision: D6678168

fbshipit-source-id: 132117633573600d4e31c1959a0ccbe34416e1f1
2018-01-10 18:16:52 -08:00
9eeb342bf9 Cut the ScopeGuard alias now that we have auto
Summary:
[Folly] Cut the `ScopeGuard` alias now that we have `auto`.

This form works because of hidden lifetime extension:
```lang=c++
folly::ScopeGuard guard = folly::makeGuard([] { /*...*/ });
//  ...
//  guard falls out of scope
```
But this form would not work correctly:
```lang=c++
folly::ScopeGuard guard = folly::makeGuard([] { /*...*/ });
std::async(std::launch::async, [guard = std::move(guard)] {});
```
Because `folly::ScopeGuard` is an rvalue-reference-to-base.
We have `auto`, so just remove `folly::ScopeGuard`. This form works correctly:
```lang=c++
auto guard = folly::makeGuard([] { /*...*/ });
std::async(std::launch::async, [guard = std::move(guard)] {});
```

Reviewed By: igorsugak

Differential Revision: D6690070

fbshipit-source-id: 54e32b300d36fce4eb95a59f1828819afe312ec0
2018-01-10 18:06:32 -08:00
b1de1f6a5e Move ScopeGuardImpl and ScopeGuardImplBase into the detail namespace
Summary:
[Folly] Move `ScopeGuardImpl` and `ScopeGuardImplBase` into the `detail` namespace.

Let them be marked as private implementation details.

Reviewed By: andrewjcg

Differential Revision: D6665317

fbshipit-source-id: 03e8fee6a16338395ec92c582613b053bd9f74ec
2018-01-10 18:06:31 -08:00
90db3fbad2 Include CMake version in configuration summary
Summary: Closes https://github.com/caffe2/caffe2/pull/1731

Reviewed By: Yangqing

Differential Revision: D6699495

Pulled By: pietern

fbshipit-source-id: 4c30ea595f8ea3b0c7bffac15e80c7412b516a16
2018-01-10 17:17:10 -08:00
ab638020f8 Backport FindCUDA functionalities from CMake
Summary:
This is in principle similar to #1612 and is tested on Windows 2017. CMake passes, although there are still bugs in the MSVC compiler that prevents cuda to compile properly.

The difference between this and #1612 is that this diff explicitly puts the CMake files into a separate folder and uses a MiscCheck.cmake chunk of code to test whether we need to include them. See README.txt for more details.
Closes https://github.com/caffe2/caffe2/pull/1727

Reviewed By: pietern

Differential Revision: D6693656

Pulled By: Yangqing

fbshipit-source-id: a74b0a1fde436d7bb2002a56affbc7bbb41ec621
2018-01-10 16:36:03 -08:00
f94f5723e7 fixed spelling (#4598) 2018-01-10 18:48:14 -05:00
0ac58d53b8 ATen conv param expansion; InstanceNorm use_running_stats fix (#4544)
* fix instancenorm and aten conv param expansion

* addressed colesbury 's comments

* improve conv input shape check
2018-01-10 17:36:26 -05:00
bc7a41af7d Ensure convolution weights are contiguous, fixes #4500 (#4543)
* Ensure convolution weights are contiguous, fixes #4500

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-10 17:33:31 -05:00
03a6b5ecea add include 2018-01-10 16:42:20 -05:00
47de6f47f3 Use GCC to compile Android Caffe2
Summary:
It seems GCC performs better. Always use it to compile.
Closes https://github.com/caffe2/caffe2/pull/1725

Reviewed By: Yangqing

Differential Revision: D6690581

Pulled By: sf-wind

fbshipit-source-id: 3fceb25fc081bd4f875e914a0465b959c7fd5eda
2018-01-10 13:09:14 -08:00
a20ac05c8b Added method cuda to PackedSequence. (#4430) 2018-01-10 21:42:37 +01:00
33d734fcf1 Generalize construction of db_name in checkpoint manager
Summary:
Instead of constructing db_name as a member of checkpoint_manager, generalize
this function

Reviewed By: anshulverma

Differential Revision: D6671088

fbshipit-source-id: c528538def66933619f2fdf67820bca5d13571ea
2018-01-10 11:49:17 -08:00
944f9aa826 Move Android.mk 2018-01-10 11:32:34 -08:00
c9bb811d6a Remove accumulate_grad version_counter check. (#4566)
* Remove accumulate_grad version_counter check.

* Fix spelling.
2018-01-10 14:22:20 -05:00
2435d22782 Move NNPACK integration to share/contrib/nnpack
Summary:
we are going to deprecate NNPACK bindings in caffe2/contrib/nnpack.
The first step is to move modern NNPACK bindings from caffe2/mobile/contrib/ios/ to
caffe2/share/contrib/nnpack/, and is implemented in this diff.

Reviewed By: sf-wind

Differential Revision: D6687454

fbshipit-source-id: 458614bade92ab5ba5d2ab7f0691071043198b57
2018-01-09 17:22:24 -08:00
cd3e90c16f Fix failed test due to D6665466
Summary: Test in Jenkins fail becasue test_global_pooling_3d filtered too many tests.  We made use of infered value of global_pooling (pad and stride will be constant) to reduce the test samples generated.

Reviewed By: pietern

Differential Revision: D6686840

fbshipit-source-id: d316c0e9f9070b12770170ab9f36e33de68a9ab9
2018-01-09 16:40:35 -08:00
040336f5dc Further fix to tracing scope (#4558)
* Set missing temporary scope in callPySymbolicMethod

* Use expected traces in all scope tests
2018-01-09 15:57:40 -05:00
cd9d0f4561 Link cpuinfo when using external NNPACK
Summary:
Close #1685
Closes https://github.com/caffe2/caffe2/pull/1722

Differential Revision: D6686071

Pulled By: Maratyszcza

fbshipit-source-id: bbe86bfd479376bc7cdfdd0bad3896f1c2356216
2018-01-09 12:50:52 -08:00
5918243b0c Methods for checking CUDA memory usage (#4511)
* gpu mem allocated

* add test

* addressed some of @apaszke 's comments

* cache stats

* add more comments about test
2018-01-09 11:47:48 -05:00
a9afecdfc8 Update installation.md to roughly match caffe2.ai
Summary:
* Also remove build status, since it isn't relevant here.

I'm tempted to just reference https://caffe2.ai/docs/getting-started.html and remove all of this, but seemed like it might be worth having a standalone installation.md doc.
Closes https://github.com/caffe2/caffe2/pull/1706

Reviewed By: Yangqing

Differential Revision: D6666561

Pulled By: orionr

fbshipit-source-id: 640f8100a5e4f8d6b2eee2266dd634bd25d0e58e
2018-01-09 08:46:57 -08:00
f4a75deccf Fix the inconsistency of polygamma on Tensor and Variable, for issue #4466 (#4527)
* Fix the inconsistency of `polygamma` on Tensor and Variable.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Regression test for #4466, polygamma works on variables.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Add macro IMPLEMENT_STATELESS_SWAP to dispatch stateless methods on Variables correctly.

When call stateless methods with more than one arguments and the `self` comes second,
the `self` argument needs to be swapped to the first position before dispatching.

The macro `IMPLEMENT_STATELESS_ADDXX` is still reserved for deprecated `add**`
methods.

Signed-off-by: HE, Tao <sighingnow@gmail.com>
2018-01-09 10:39:09 -05:00
a0ab48575e Implement backward pass for pack_padded_sequence 2018-01-09 12:33:15 +01:00
f99c7d9429 Padding_idx in Embedding supports negative indexing (#4496) 2018-01-09 12:04:11 +01:00
82198831e7 Fix pool op custom path issue 2, wrongful routing to global pooling
Summary:
In D5681122 - when routing to global maxpool and average pool, the condition is not correct.
see T24876217 for discussion

Reviewed By: Yangqing

Differential Revision: D6665466

fbshipit-source-id: dcb5b4686249e6ee8e1e976ab66b003ef09b32fd
2018-01-09 00:54:45 -08:00
3a335427b0 Start framework for kl_divergence(-,-) in torch.distributions (#4525) 2018-01-09 09:44:59 +01:00
b3710a2e01 Fix a missing AutoGPU (#4545)
ATen dispatch in the JIT interpreter needs to switch the current gpu,
but it is not handled in ATen itself, and no higher-level pathway
ensures the device is set correctly.

This also improves debugging information for cross-device issues.
2018-01-08 19:57:59 -05:00
a3f4fa254c support GRU export to ONNX (#4390) 2018-01-08 19:56:29 -05:00
a59bb97868 [ATen] Support wrapping dimensions over scalars. (#4536)
This follows the behavior of numpy in that you can wrap dimensions over a scalar (0-dimensional
tensor) in the range [-1, 0].  I.e. scalarTensor.prod(0) and scalarTensor.prod(-1) works, but
scalarTensor.prod(2) does not.

The only current exception to this is with size(dim) and stride(dim);
there are no numpy equivalents of these (they are attributes), so it seems cleaner to just have
these as (dimensional wrapping) sugar for sizes()[dim] and strides()[dim]; otherwise there are
subtle differences in semantics, e.g. you have to use size(dim) when you want it to directly
apply to scalars, if the default value (1?) makes sense in that case.  Simpler to just not have
that difference.

Note that this change can cause problems if code assumed that maybe_wrap_dim would throw an
exception in this case and then called sizes()[dim] or size(dim) without checking; I went
through the code and only found this case in squeeze/squeeze_.
2018-01-08 19:54:26 -05:00
674ddf6b91 Fix multi-gpu fuser bug
cuModuleLoad is only valid for a single device so we need to
compile for the particular device that the fusion group will run on.
CompiledFunction already specializes different traces for tensors,
so we just need to have fusion_compiler produce the cuFunction on
the right device.
2018-01-08 15:04:22 -08:00
e3bafb884a Link NNPACK even when CUDA is not available. (#4541)
Fixes #4526.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-08 17:25:33 -05:00
5d6a5cf3a7 Implementation of Gumbel Distribution (#4517) 2018-01-08 23:21:27 +01:00
8fe3d287b2 Fix return type for Bernoulli enumerate_support (#4529) 2018-01-08 23:17:43 +01:00
12309f4aa6 GRU cell: add linear_before_reset boolean parameter
Summary:
This matches the semantics of cudnn (and others, like pytorch)
Closes https://github.com/caffe2/caffe2/pull/1695

Reviewed By: dzhulgakov

Differential Revision: D6658208

Pulled By: anderspapitto

fbshipit-source-id: 00e1716fba47b0ac296d1e9e0131165f4997ac7d
2018-01-08 13:22:56 -08:00
41bb662d96 add dense regularization
Reviewed By: xianjiec

Differential Revision: D5617571

fbshipit-source-id: 875d7c8753bdb3b6847d5e3f47ad8568cdf172f8
2018-01-08 13:03:17 -08:00
073312eade Updates to MKL conversion script
Summary: Handling some special cases.

Reviewed By: ajtulloch

Differential Revision: D6647011

fbshipit-source-id: 6a434442da5e0a63d355242cb8df9418885c6fb4
2018-01-08 12:25:23 -08:00
04ad23252a Refactor gen_variable_type (#4487)
The gen_variable_type.py script now is only responsible for generating
VariableType.h/cpp. The parent script, "gen_autograd.py", delegates to
gen_autograd_functions.py, gen_variable_type.py, and
gen_python_functions.py.

I've removed "fallthrough" functions. It's replaced by
DONT_RECORD_TRACE, DONT_PROFILE, and DONT_REQUIRE_DERIVATIVE.

In preparation for binding the _out variants, I changed some static
types to Tensor (from Variable) and we now unpack and name tuple return
values.
2018-01-08 13:43:09 -05:00
3f974d6ffe [ATen] Improve ASSERT test infra. (#4505)
1) Separates ASSERT_THROWS and ASSERT_THROWSM for checking messages vs not.
2) ADDS TRY_CATCH_ELSE for python-style error checking
3) Uses ASSERT_THROWS and TRY_CATCH_ELSE more generally

The previous more ad-hoc constructions were often wrong, i.e. an assert could
pass if the logical else threw an exception if it passed the assert in the catch.
2018-01-08 12:22:44 -05:00
7d25a41251 Fix #4492, make it impossible to forget to reset cudnn flags (#4503)
Three stage plan to no more stupidly weird "why isn't cuDNN enabled"
bugs:

- Add torch.backends.cudnn.disable_global_flags(), which as its name suggests,
  disables global flag setting in cuDNN, so that you are not allowed to
  make changes to this state.  However, the flags() context
  manager continues to work (since they are non-global changes).

- Call disable_global_flags() in test/common.py

- Switch all of the manual flag setting/unsetting in test/test_nn.py
  to use the context manager.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-08 12:21:09 -05:00
d3612a5914 Fix tracking of tracing scopes during ONNX pass (#4524)
* Fix tracking of tracing scopes during ONNX pass

* Use ResourceGuard to manage setting a temporary current scope in Graph

* Add tests for ONNX pass scopes

* Remove unused num_classes argument
2018-01-08 12:20:52 -05:00
c650c73cbc Extract the finish check for profiler (#4519)
* Extract the finish check for profiler

Delete unused import and rearrange the import order.

* Add imports for win support
2018-01-08 07:54:55 -05:00
c9bc6c2bc3 Implement Student's t-distribution (#4510) 2018-01-08 10:23:48 +01:00
5c641cc14f Fix abs specialization for uint8_t type. (#4521)
Signed-off-by: HE, Tao <sighingnow@gmail.com>
2018-01-07 08:38:26 -05:00
e5f25421ae Implement demangle in Windows (#4515) 2018-01-07 05:35:10 -05:00
2dd7039b6b Fix multiprocessing and dataloader tests on Windows (#4453) 2018-01-06 17:41:36 +01:00
21d48be2dc Delete redundant isContiguous check from THCUNN SpatialDilatedConvolution
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-06 10:58:05 -05:00
0f8ece5657 Actually test CUDA double-backwards codepath.
Previously, we only tested CPU double-backwards, which is bad!
This would have caught #4422 (still not fixed, so those tests
are manually disabled) and also uncovered #4500 (not yet diagnosed.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-06 10:58:05 -05:00
4e3a4bd688 Check for out of bounds grads access in derivatives.yaml
This test would have caught the OOB in thnn_conv_depthwise2d_backward

Fixes #4457

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-06 10:58:05 -05:00
6a266f5832 s/uses_grad/uses_single_grad/ for more clarity.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-06 10:58:05 -05:00
ed33dc1d4f Fix 'invalid argument 4: weight tensor has to be contiguous'
Weight can be non-contiguous due to double backwards, where
we transpose the weight.  I'm not very happy with this fix
but it seems to make the tests pass.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-06 10:58:05 -05:00
1f8a8cc941 Fix two bugs in thnn_conv_depthwise2d_backward gradient.
- Out of bounds grads[2] access (thnn_conv_depthwise2d_backward
  doesn't compute bias gradient)

- Groups was not set appropriately for depthwise convolution

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-06 10:58:05 -05:00
123f49badb Add Slicing capabilities for Sequential, ModuleList and ParameterList (#4491) 2018-01-06 13:01:17 +01:00
9c2561e60c Fixes #4475, Add debug flag for Windows (#4508) 2018-01-06 12:49:36 +01:00
1dd441ba32 Use auto for scope-guard locals v.s. folly::ScopeGuard
Summary: Use `auto` for scope-guard locals v.s. `folly::ScopeGuard`.

Reviewed By: igorsugak, meyering

Differential Revision: D6664915

fbshipit-source-id: ea239b712f3f9dc7ef81105aaf82f4b36bc07db5
2018-01-05 23:01:11 -08:00
c1d9694f42 Backed out changeset 6f532bad5824
Summary: D6636282 caused regression test failure of nmt model use in prod, see 24949620 for besect history.

Reviewed By: pietern

Differential Revision: D6671602

fbshipit-source-id: d863013964666727cf488a6ac5b01f5216f149d9
2018-01-05 19:34:38 -08:00
ad45d1bfb5 Added Caffe2 operator CPU binding for Gloo Allgather
Summary:
Added Caffe2 operator binding for Gloo Allgather algorithm.
Added new test to verify the binding. Binding is supported only for
CPU device with these changes.

Reviewed By: pietern

Differential Revision: D6610074

fbshipit-source-id: b21df9b5e71befbdb6841d6b146727bb4c83d753
2018-01-05 12:42:39 -08:00
027407f224 [ATen] have ger handle scalars like np.outer. (#4489)
Basically, scalars and implicitly unsqueezed.
2018-01-05 15:38:17 -05:00
6d32e36682 Caffe2 Operator: GPU implementation of Swish Activation
Summary: GPU (CUDA) implementation of the Swish activation function in Caffe2.

Reviewed By: Yangqing, xianjiec

Differential Revision: D6656907

fbshipit-source-id: f5f2c667055abf679728d2b5d43998895ddec708
2018-01-05 12:04:25 -08:00
b8fd57a0cc Fix handling of empty indices in CUDA Tensor.put_ (#4486)
Fixes #4386
2018-01-05 12:58:27 -05:00
408c84de7c Supporting logits as parameters in Bernoulli and Categorical (#4448)
* Supporting logits as parameters in Bernoulli and Categorical

* address comments

* fix lint

* modify binary_cross_entropy_with_logits

* address comments

* add descriptor for lazy attributes

* address comments
2018-01-05 03:45:05 -05:00
0afcc8ebb9 Fix typo in fusion compiler (#4488)
This mismatched paren causes a syntax error in generated code. I'm guessing the parentheses are necessary, since there was one in there before, but I don't actually know whether the compiler can produce things like a - (b - c) that would make them required.
2018-01-05 02:16:42 -05:00
2cda295244 Adds cpu version of transpose util function in math.
Summary: Adds transpose CPU version to prepare for LC layer.

Reviewed By: Yangqing

Differential Revision: D6641358

fbshipit-source-id: 1825b4c270dea2c0049ba334303abcbf50b22ee7
2018-01-04 23:05:40 -08:00
a43fd6ae52 Bump gloo
Summary:
This includes a fix for caffe2/caffe2#1146.
Closes https://github.com/caffe2/caffe2/pull/1609

Differential Revision: D6664351

Pulled By: pietern

fbshipit-source-id: 21a206fa0cfcefa95d91a1c279220444854ca5f4
2018-01-04 17:49:21 -08:00
3725d8ea97 Disable the python op test numba import in asan
Summary:
Some installations of numba seems to be not compatible with asan, so we
will disable its import.

Reviewed By: dzhulgakov

Differential Revision: D6664055

fbshipit-source-id: 311774667e54bdbf328ef280ab2a52ecba1361f2
2018-01-04 17:49:21 -08:00
64b0039ef9 rnn_cell_test: make it determinitistic and speed up
Summary:
In this PR I do the following:

1. split lstm_test_main into several tests for LSTM, MiLSTM and various Norm based versions
2. instead of looping over various gradient / optimization parameters now they are random inputs through hypothesis.
3.  These change make the test faster and we can avoid limiting number of examples
4. Fix a minor bug with gradient checker in RNN unroll test running twice
5. Generate seed for numpy in hypothesis. This make hypothesis avoid having fluky tests

Also note that Norm tests sometimes fail. I haven't looked into it much, it could be just precision issues. New test split should help identify these issues.
Closes https://github.com/caffe2/caffe2/pull/1678

Reviewed By: pietern

Differential Revision: D6657076

Pulled By: salexspb

fbshipit-source-id: 9f59c71ccd2c818156e9d2424c3423d450b8c8e2
2018-01-04 15:00:42 -08:00
a3e91515de Declare constraints for distribution parameters and support (#4450) 2018-01-04 23:58:26 +01:00
c6adee0807 disable CUDA HalfTensor tests in test_cuda for Windows (#4482) 2018-01-04 22:58:13 +01:00
1e76ade9dc Implementation of Pareto Distribution (#4459) 2018-01-04 22:57:47 +01:00
73bdb661fe Fix BCELoss test precision (#4484)
BCELoss's outputs and gradInput computations are accurate to around 1e-6 on float types (as a relative value, not absolute), which is reasonable. However, the tests use absolute thresholds: the accumulation of 5 gradInputs has to have error less than 0.0002.

The worse case for BCELoss's gradInput for each element may be described as 1 / ( (1-x) * x ). Previously, the input to the test was restricted to [0.02, 1- 0.02], resulting in worse-case largest gradInput of 50, resulting in a total accumulated grad of 50*5 = 250, resulting in an error of 250 * 1e-6 = 0.00025, which was too big.

By restricting x to [0.028, 1- 0.028] we get a worse case of 36.74, resulting in a total accumulated grad of 184, which is less than the 200 needed to have error less than 0.0002.
2018-01-04 16:54:51 -05:00
dfde42a94c Remove THPGenerator default code for random functions in Declarations.cwrap. (#4479)
The specification and logic aren't necessary anymore, it's fine to specify the default as nullptr.
2018-01-04 16:53:52 -05:00
58f6008f76 Improvements around torch.cat on empty Variables (#3602)
* Add test for empty Variable cat (forward only).

* Test for empty cat (no grad/gradgrad checks)

* Support gradcheck on empty inputs, check it for cat with an empty Variable.

* Fix lint.
2018-01-04 14:47:10 -05:00
d1c973fee1 Hot patch ONNX _run_symbolic_function 2018-01-04 13:17:21 -05:00
35c4d73bdb Deprecate nn.NLLLoss2d (#4238)
* Deprecate nn.NLLLoss2d

* Fix legacy tests

* Fix tests

* Remove NLLLoss2d from docs, add deprecation warning instead of error

* fix lint

* Add more to docs
2018-01-04 12:38:04 -05:00
fe70823f8e Fix StepLR docs (#4478) 2018-01-04 12:37:26 -05:00
fc0d940c5e add gumbel_softmax, based on Eric Jang's implementation (#3341)
* add gumbel_softmax, based on Eric Jang's implementation

* Make gumbel_softmax CUDA friendly

* gumbel_softmax tweaks
2018-01-04 12:23:21 -05:00
2d68956005 Add Tensor::print() for gdb use.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-04 12:12:28 -05:00
b98740b8ec Remove request for proposal link from README.md
Summary:
* The request has finished. We might do others in the future, but removing for now.
Closes https://github.com/caffe2/caffe2/pull/1700

Reviewed By: Yangqing

Differential Revision: D6659664

Pulled By: orionr

fbshipit-source-id: cd49d41bdde3c07b5acbcd4724aaa359f69e4752
2018-01-04 09:11:05 -08:00
7c729e6321 - added size_splits to functional (#3837) 2018-01-04 09:52:47 -05:00
dc76db349e Delete a pile of dead code (#4295)
* Delete obsolete basic ops.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* More deletion.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete some unused utilities.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete dead apply_fn

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete CppFunction symbolic support.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete ForwardFunction

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Batchnorm is 'working'

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-04 09:21:54 -05:00
af8b64aadc Fix template type for std::array size (#4473) 2018-01-04 08:45:39 -05:00
d7d396b14b Modify derivatives for efficiency and change destination to result for consistency (#4415)
* make derivative changes and change destination --> result

* fix typo

* add changes for addcdiv also

* modify rsqrt derivative

* revert the derivative for addcdiv

* revert the derivative for div

* fix typo, sorry
2018-01-04 08:45:09 -05:00
2e279eb260 Make the dash nicer 2018-01-04 13:51:01 +01:00
a246d56150 Update build matrix badges in the README 2018-01-04 13:48:29 +01:00
b062769940 instance norm fix running stats settings (#4444) 2018-01-04 07:17:55 -05:00
d7da50473e Add check for slice shape match in index_copy_ and index_add_. (#4342)
Emits a warning if slices have the same size but different shapes.  (It
shouldn't be allowed, but it was, so some code might be unknowingly depending on
the behavior.)

Also refactored argument checking code, including index_fill_.
2018-01-04 07:17:02 -05:00
5b91b240d2 adds missing argument (#4446) 2018-01-04 01:51:47 -05:00
68726df0ac Fix GemmBatchedOp
Summary: Fix GemmBatchedOp to prepare for LC Layer.

Reviewed By: Yangqing

Differential Revision: D6636282

fbshipit-source-id: 6f532bad582442ebf3da843e973eb85405371c02
2018-01-03 21:16:18 -08:00
c43b120d43 Improve float precision stability of linspace op, fix 4419. (#4470)
Signed-off-by: HE, Tao <sighingnow@gmail.com>
2018-01-03 22:45:26 -05:00
cc9dc3f343 add lock for SynchronizedSeedDataset; add additional os level close stderr for tests that launch failing process (#4463) 2018-01-03 22:45:05 -05:00
28eea8b032 Adding commandline flags to disable implicit engine preferences.
Summary:
During debugging I found that our recently added automatic engine preference actually makes debugging a bit harder - it implicitly routes computation to e.g. CUDNN when we actually want to test out the default GPU implementations.

This diff adds a commandline flag that disables it.
Closes https://github.com/caffe2/caffe2/pull/1696

Reviewed By: pietern

Differential Revision: D6658765

Pulled By: Yangqing

fbshipit-source-id: ef56a16e778eeea6ecdd4dc6002421236e15371a
2018-01-03 18:31:49 -08:00
cc70a33e74 Windows fix for #4312 2018-01-03 21:28:31 -05:00
e9f0761460 Fix pool op custom path issue
Summary:
This was introduced in D5681122 - it causes a pretty serious numerical issue
that broke pooling test.

Specifically, if threadIdx.x > sz, max is initialized with an out of bound index
and the max is incorrectly computed.

Reviewed By: pietern

Differential Revision: D6658945

fbshipit-source-id: 487222d26050921ff9c7764fe46076e31a99bb86
2018-01-03 18:14:38 -08:00
a7cc653139 cmake: handle CUDA 9.1 in GCC version check
Summary:
GCC version check is currently being skipped when using the
newly released CUDA 9.1.

This will also handle other CUDA 9.x minor releases if any,
reducing our work if there are such releases like 9.2. This
assumes that the next major CUDA version will be 10.0,
needing adjustment only after such major version is
released.
Closes https://github.com/caffe2/caffe2/pull/1658

Differential Revision: D6659000

Pulled By: pietern

fbshipit-source-id: 79291b5da9d4e8b4f2c7ac82fe2b1e7939438bc9
2018-01-03 17:42:55 -08:00
3329f36f1a Move load_save_test.py from caffe2/python/ to caffe2/python/operator_test/
Summary: Move load_save_test.py from caffe2/python to caffe2/python/operator_test/

Reviewed By: boryiingsu

Differential Revision: D6657724

fbshipit-source-id: 030942316444ec93c3bc2970902d7b3980e60cfc
2018-01-03 17:42:55 -08:00
48436ac124 Fix compile warning "implicit declaration of function" (#4467) 2018-01-03 20:37:20 -05:00
f90feac38b Adding conda specific script to macos builds
Summary: Closes https://github.com/caffe2/caffe2/pull/1640

Reviewed By: pietern

Differential Revision: D6624013

Pulled By: pjh5

fbshipit-source-id: 0e980f020bce7bca1cb0845114a6071a004443af
2018-01-03 17:14:27 -08:00
6a0c636d4e Don't special case NN functions in gen_variable_type.py (#4395)
This modifies NN binding in ATen so that the xxx_forward functions now
return buffers instead of taking them as inputs. The NN functions with
no suffix are implemented in Type.cpp. They call the xxx_forward
variants and discard any returned buffers.

This simplifies derivatives for NN functions. The derivatives are now
defined on the xxx_forward functions and buffers are treated as any
other input.
2018-01-03 19:22:50 -05:00
e2ccd6e7ab Rename native/TensorGeometry to native/TensorShape since there is already an ATen (non-native) TensorGeometry. (#4465) 2018-01-03 18:30:47 -05:00
73fbad0bfc Fix some scalar checks (#4462)
* Run scalar_tensor_test on CUDA if available.

* Fix take scalar_check.

* Make ger scalar check always false.
2018-01-03 17:58:19 -05:00
d80669fce8 Guard PyArray_Check with WITH_NUMPY 2018-01-03 22:33:21 +01:00
77c792ec27 Vectorize normal_ (#4312) 2018-01-03 22:30:55 +01:00
e426020c87 Move prod, cumprod backwards to C++ (#4394)
* Add view_as as a native_function.

* Move prod, cumprod backwards to C++.

* Update for review requets.

* Review comments.

* Reorder slice parameters so dim is first.

* Update test_slice.

* Update test_autograd.

* Fix flake8.
2018-01-03 16:27:50 -05:00
9835ca9bac Ensure indices list in sparse optimizer tests is unique
Summary:
There were no dimensionality constraints to the generated indices
array, causing many examples being generated and filtered out. Instead,
we should ensure the probability of unique indices is high.

There is a better fix for this by using the `unique` keyword argument
to `hypothesis.extra.numpy.arrays`, but this is available only in
hypothesis version 3.28.0 and later.

This is related to #1536 and #1599.

Once this change has proven to be OK, we can modify the other tests
that now have health check suppression enabled as well.
Closes https://github.com/caffe2/caffe2/pull/1686

Reviewed By: Yangqing

Differential Revision: D6651789

Pulled By: pietern

fbshipit-source-id: d80886c9ccf0a7a842a7580a279f33a2d6cca97c
2018-01-03 12:19:14 -08:00
f321f61b9a Improve dropout
Previously it would unnecessarily clone the input in eval mode.
2018-01-03 13:44:49 -05:00
17148f891f Fix a leak in JIT interpreter 2018-01-03 13:44:49 -05:00
2d2b157d25 Handle repeated outputs in the tracer 2018-01-03 17:29:27 +01:00
e6cbe84bf6 Handle repeated inputs in JIT tracer 2018-01-03 17:29:27 +01:00
f05ca657dd added fix for #4408 and a test (#4452)
* added fix for #4408 and a test

* forgot import

* moved test to onnxbot/onnx-fb-universe
2018-01-03 10:23:50 -05:00
387b4234ea Provide CMake support for detectron ops
Reviewed By: Yangqing

Differential Revision: D6637258

fbshipit-source-id: 72b2bf55a5f8ca8e322c8b65f62977416319ed9e
2018-01-03 06:23:14 -08:00
82e995e0b9 Windows fix for #4322 (#4455)
* fix DISPATCH_ALL_FLOATING_TYPES

* fix precision issue
2018-01-03 06:07:53 -05:00
bf37548ccc Properly include the generate proposal headers.
The header files will be committed separately from fbcode.
2018-01-02 21:05:19 -08:00
3cdcbd5986 re-apply D6652354
Summary: TSIA - trying to address internal build errors.

Reviewed By: Maratyszcza

Differential Revision: D6654287

fbshipit-source-id: dfb77797d2bb449831418a7161587fa724985053
2018-01-02 21:02:18 -08:00
56508566a1 Enhance Caffe2 Load op to support loading blobs from multiple files.
Summary: The current Load op can only load blobs from one file. We need to make the Load op to support loading blobs from a list of dbs.

Reviewed By: boryiingsu

Differential Revision: D6596034

fbshipit-source-id: 906fa48b0ad61c83e247d497b6b079c04fed499f
2018-01-02 18:02:19 -08:00
2f23ab0bfe Revert D6652354: [caffe2] Move proposal generation headers to oss.
Summary:
This reverts commit fd291f662e3793b6d11a7e02e1acc741c027a1fd

bypass-lint

Differential Revision: D6652354

fbshipit-source-id: 108bf97e5c2e27dd73954ef5d2b7c16c434e4597
2018-01-02 17:21:43 -08:00
2af506cb6c Move proposal generation headers to oss.
Summary: TSIA - it used to cause build errors.

Reviewed By: pietern

Differential Revision: D6652354

fbshipit-source-id: fd291f662e3793b6d11a7e02e1acc741c027a1fd
2018-01-02 16:33:56 -08:00
8fd3888c4c Provide CMake support for contrib/prof
Summary:
`contrib/prof` provides functionality for profiling (eg. `prof_dag`) but no CMake.
Hence, provide CMake support for building it.

Reviewed By: Yangqing

Differential Revision: D6640488

fbshipit-source-id: 9ed8095b10d7c0337db061206daf2a66f41f4713
2018-01-02 16:02:32 -08:00
77484ecc45 Manually applying cudnn5 pull request.
Summary: TSIA. Closes #1631

Reviewed By: pietern, Maratyszcza

Differential Revision: D6626887

fbshipit-source-id: 1a2dc7c47bc6ce794fdf598fbd547c04029edce4
2018-01-02 15:31:33 -08:00
48492b02cd disable travis webhook as we are moving to jenkins as CI
Summary:
cc pietern
Closes https://github.com/caffe2/caffe2/pull/1687

Differential Revision: D6652179

Pulled By: Yangqing

fbshipit-source-id: 2ff0efe85970d1e48abd3afac694b76251d45d28
2018-01-02 14:42:15 -08:00
33bb849a73 Remove assign_(Scalar). (#4445) 2018-01-02 16:32:11 -05:00
bc50510016 use numerically stable version of BatchLRLoss
Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py.

Reviewed By: xianjiec

Differential Revision: D6643074

fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12
2018-01-02 13:18:36 -08:00
20b5e82155 Implement embedding in ATen (#4322)
Implements nn.Embedding (lookup table) in ATen.

Breaking change: new optional argument padding_idx in F.embedding to
match nn.Embedding.

Note that there are a few bugs in Embedding that are inherited from the
previous code:

 - CUDA renorm has race conditions if index contains duplicate entries
 - sparse gradient doesn't work with scale_grad_by_freq
2018-01-02 15:44:46 -05:00
43ab911182 Improve precision of dirichlet_grad() approximation (#4421) 2018-01-02 20:53:47 +01:00
bb04034bf7 Adding a time limit reader
Summary: ReaderWithTimeLimit() class to stop after a certain amount of time

Reviewed By: boryiingsu

Differential Revision: D6477623

fbshipit-source-id: 165874c9344b0c9c7e0b33e12e72e24c46669cb2
2018-01-02 11:33:53 -08:00
bf1c7d96c8 turn off unsupported multiprocessing methods for Windows 2018-01-02 20:03:31 +01:00
2060f355a6 Fix python gc race condition with THPVariable_traverse (#4437) 2018-01-02 19:57:21 +01:00
18a866aedd Add random_split to torch.utils.data.dataset (#4435) 2018-01-02 18:56:49 +01:00
57f9db9c3c Two NNPACK build fixes. (#4439)
1. master NNPACK now uses cpuinfo library, so we detect it and
add it to the list of libraries.

2. If a user builds nnpack with --inference-only, there won't
actually be enough symbols to successfully link against NNPACK.
This won't manifest until quite late in the build process.
So we now explicitly test that the gradient functions are
available in the library.

Upstream bug: https://github.com/Maratyszcza/NNPACK/issues/123

Fixes #4336

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-02 12:42:40 -05:00
98e5f2c808 nllloss doc (#4438) 2018-01-02 12:39:21 -05:00
2cee02cc86 [ATen] Get rid of assign_(Tensor), use copy_ instead. (#4397)
* [ATen] Get rid of assign_(Tensor), use copy_ instead.

Note: assign_(Scalar) still exists.

* Get rid of test that is no longer valid.
2018-01-02 11:35:35 -05:00
4d4b7782cd Fixes for Windows build on master (#4432) 2018-01-02 12:58:37 +01:00
35abc4efa2 Add low-precision digamma() and polygamma() functions (#4399) 2018-01-02 11:53:23 +01:00
7592e96503 More detailed documentation. (#4428)
* More detailed documentation.

* More detailed documentation.

* Fixed W291

* minor bug fixes
2018-01-01 22:14:41 -05:00
02e7eba309 Implement Chi2 distribution (#4425)
* add chi2

* add tests for chi2

* add randomized test comments
2018-01-01 19:41:18 -05:00
98c02c20b1 fixes #4403 (#4407) 2018-01-01 23:44:03 +01:00
0b328874c6 Pick up NO_NNPACK for ATen build (#4423) 2018-01-01 16:00:56 +09:00
b7c64249cb Add quotes and fix ninja on Windows (#4416) 2017-12-31 18:21:21 +09:00
2f25b9d052 fix build error for unix (#4414) 2017-12-31 14:11:32 +09:00
4cf13cf417 Fix crash due to copying empty tensors into MKLMemory
Summary:
Ran into a scenario where if the CPU op in MKLFallbackOp outputs an empty
tensor, attempting to copy the output to MKLMemory (https://fburl.com/www2mtt4)
crashes. Modify MKLMemory to gracefully handle this. This is done at the
MKLMemory level because we want to make sure that its members such as dims and
layout are Reset() correctly.

Interestingly, MKL calls fail at different points for dims {0} and dims {0,N} despite
the buffer size being empty for both - former in dnnAllocateBuffer and
the latter in dnnConversionExecute (likely due to some difference in
layout?).

Also fixed CopyTo in addition to CopyFrom and tested all scenarios.

Reviewed By: ajtulloch

Differential Revision: D6646320

fbshipit-source-id: 61df585f610a949f312f05308baf310241dc9cb2
2017-12-30 15:36:48 -08:00
859a173502 fix AMSGrad for SparseAdam (#4314) 2017-12-30 13:00:17 +01:00
b78a37a058 Enable ninja during python build process for MSVC (#3993) 2017-12-30 12:58:32 +01:00
fec3d4a079 RNN support has been implemented (#4409)
* RNN support has been implemented

4447b80b5e was merged in and now support RNN
2017-12-30 09:26:36 +09:00
240d448a9c Split cuda native functions into components; fix mistake with conv_tbc cpp move. (#4398) 2017-12-29 18:01:24 -05:00
6185b27cc6 Improve precision of standard_gamma_grad() (#4369) 2017-12-29 12:11:04 +01:00
fa8de6b4f3 Adding the Cauchy distribution to torch.distributions 2017-12-29 11:57:21 +01:00
99068d2e52 fix nn.init.constant example 2017-12-29 19:14:53 +09:00
8c9a22a88e Support NO_NNPACK environment variable (#4401)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-29 16:33:01 +09:00
4238f5e604 Extract some utility operators to their own source files to reduce build size.
Summary: Extract some operators from utility_ops and normalize_op to reduce build size impact of depending on these files.

Reviewed By: Maratyszcza

Differential Revision: D6616741

fbshipit-source-id: 1757b6b8a3ce4e2a248deee61322344e5095e940
2017-12-28 20:35:44 -08:00
b132187014 Add vulkan stub
Summary:
Imported and modified from https://github.com/ARM-software/vulkan-sdk
I changed libvulkan-stub.cpp to libvulkan-stub.c

Reviewed By: Maratyszcza

Differential Revision: D6641092

fbshipit-source-id: 1a7fbf745d58b6111a06a983910c583912365357
2017-12-28 17:37:07 -08:00
ff9d1aeab5 removes duplicate variable reference crash from pad_sequences (#4383) 2017-12-29 08:34:53 +09:00
98f71912b0 Fix type signature of in-place NN functions (#4389)
This is a step towards removing the special casing of NN functions in gen_variable_type.py. It fixes the signature of in-place NN functions so that they return Tensor & instead of Tensor.
2017-12-28 16:50:09 -05:00
af3bffb638 Update derivative of expm1 2017-12-28 20:41:20 +01:00
ab80c27b47 Fix undefined FileNotFoundError (#4384) 2017-12-28 20:32:49 +01:00
89acc10f85 Adding description for Optimizers (#4371) 2017-12-28 16:55:52 +01:00
5c33400dd3 Implement OneHotCategorical distribution (#4357) 2017-12-28 16:54:55 +01:00
3a169780e9 fix some typos (#4379) 2017-12-28 22:23:31 +09:00
e519ef5337 Adding torch.expm1() and its inplace function (#4350) 2017-12-28 18:56:03 +09:00
d859c3c7cc Fix creating tensors with np.longlong array 2017-12-28 09:15:03 +09:00
f8a4b1a266 Split off load_derivatives and gen_autograd_functions from gen_variable_type (#4370) 2017-12-27 18:59:41 -05:00
410fd58b4f support RNN export (#4163)
Currently 1-layer RNN is supported
2017-12-27 18:10:53 -05:00
15b657af84 Support ATen GPU pointwise apply and torch.where. (#4304)
* Support ATen GPU pointwise apply and torch.where.

Like the CPU version, this implements an apply template that is almost identical to the
apply template already in THC, but using the ATen API.  Much of this involves stripping out
the TensorUtils code (which is basically templated ATen-style), although a couple of functions
remain that are apply specific (and thus don't seem worth porting to ATen), namely
overlappingIndices, canUse32BitIndexMath, and getTensorInfo.  We can make those generally
available if there's a need.

* Use int64_t instead of ptrdiff_t.

* Use snake case for _copyIgnoringOverlaps_.
2017-12-27 16:36:50 -05:00
5bcacb21d5 add bias term to linear __repr__ functions, fix spacing
Adds a missing bias term to the __repr__ functions of the
Linear and Bilinear modules. Fixes the spacing in the Conv2d
__repr__ to make it consistent with other modules.
2017-12-27 22:08:17 +01:00
cd23994dbb Improve matmul native test tolerance. (#4365)
* Improve matmul native test tolerance.

Because we don't directly use bmm in one case of matmul, a comparison to bmm doesn't make sense;
instead, we compare to the double result.

* Fix spelling.
2017-12-27 15:33:44 -05:00
a76ac19955 VariableType clean-up (#4366)
- as_variable no longer needs to be an instance function
 - mark functions as static
2017-12-27 15:07:00 -05:00
4453a5402f allow_inf on test_beta_log_prob (#4354)
* allow_inf on test_beta_log_prob
* Support allow_inf on assertAlmostEqual

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-27 09:25:12 -05:00
1b608eea4e Fix distribution tests due to merge order (#4351) 2017-12-26 16:37:15 -05:00
ffa7fab67f Minor changes to test utils to catch type errors (#4270) 2017-12-26 10:08:33 +01:00
15163a3273 Improved documentation of several index operations. 2017-12-26 06:08:44 +08:00
efa7c895f6 Misc Windows lint
Summary: Closes https://github.com/caffe2/caffe2/pull/1656

Differential Revision: D6633052

Pulled By: Yangqing

fbshipit-source-id: 5eeb3912fc769cfd06d252f3ed1d8d5f2a207cfc
2017-12-23 20:07:27 -08:00
26168e22cd fix NameError in torch/nn/rnn.py 2017-12-24 00:26:02 +01:00
6646c3e542 remove CPU builds from Travis, as they are now covered by Jenkins 2017-12-24 06:27:03 +08:00
9a48f8d7c3 add tests for btrifact_with_info and doc for btriunpack 2017-12-24 03:08:28 +08:00
658d4c7ea8 allow optional int tensor 2017-12-24 03:08:28 +08:00
a51a094200 fix MaxPool2d __repr__ missing ceil_mode summary (#4335) 2017-12-24 03:07:22 +08:00
0c4b3f4271 Adding Uniform distribution to PyTorch (#4328) 2017-12-23 15:14:44 +01:00
e9bfe8ca92 Make expect file directory search more robust.
Previously, we assumed that __main__ was the test file
being run, which is not true if you are using pytest.  New
algorithm uses __module__ of the test class, which is a bit
more robust.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-23 11:44:14 +01:00
1a0eefd5fc Parallelize batcher
Summary: Still WIP, but works for the universal encoder. The other ones are currently broken.

Differential Revision: D6492786

fbshipit-source-id: 232e0058eb3a0c036de3adf0295db5efd624cca7
2017-12-22 20:23:26 -08:00
3304185c6c Fix test_gamma_sample_grad. (#4327) 2017-12-22 22:01:04 -05:00
3b7fbc397e Reorder native_functions.yaml by alphabetical order. (#4326) 2017-12-22 21:23:27 -05:00
3837a962d3 Fix typo in Concat and Softmax
Reviewed By: Maratyszcza

Differential Revision: D6629260

fbshipit-source-id: 06fff59a770312b6948b3b5e1c04db6f539ea268
2017-12-22 17:49:16 -08:00
5fb5e7b01d Split NativeFunctions.cpp into functional components. (#4325) 2017-12-22 20:08:22 -05:00
3264ef95f0 Test CUDA types in native_test and make sure THCTensorRandom launches are valid. (#4323)
When debugging related issues, cuda-gdb was complaining about 0-sized launches from
THCTensorRandom, so these now only launch when the size is valid.
2017-12-22 17:13:06 -05:00
6c4e97220a disable test_gamma_sample_grad until it's fixed (#4324)
* disable test_gamma_sample_grad until we have SciPy in xenial builds

* skip test_gamma_sample_grad until it's fixed
2017-12-22 17:11:15 -05:00
e6ad0ea27f disable test_distributed for windows (#4317) 2017-12-23 04:13:39 +08:00
de28e754b2 Make Variable.is_sparse an attribute (#4308)
This matches Tensor.is_sparse, which makes it easier to replace Tensor
with Variable.
2017-12-22 12:46:28 -05:00
7f6ca8efa5 Fixed unused return value from write 2017-12-22 17:08:05 +01:00
89de9a494a Generate grad_input_mask only if it's actually used 2017-12-22 17:08:05 +01:00
d4fd9a3fd4 Remove unused functions 2017-12-22 17:08:05 +01:00
9488eeb308 Fix signed compare + redefined macro in libs 2017-12-22 17:08:05 +01:00
fb46836fc6 Fixed unused result warnings in THD 2017-12-22 17:08:05 +01:00
492e26fbcd Pad sequences and Pack sequences (#3875) 2017-12-22 16:14:09 +01:00
5d3fc364aa Fix OSS build
Summary: Add missing .cc file into CMakeLists for pybind

Reviewed By: pjh5, houseroad

Differential Revision: D6625894

fbshipit-source-id: 900f10bf7d9abd1e2a1b8cdf56f098664a575889
2017-12-21 19:04:25 -08:00
5d6dacaafe Enable building operator QuantDecompZstd
Summary:
Make operator QuantDecompZstd buildable in open source. The operator is not built by default. Need to specify -DBUILD_SHARE_DIR=ON -DUSE_ZSTD=ON to build it.

Test plans: Build android caffe2 with the change without issue. Run a model with the operator successfully.
Closes https://github.com/caffe2/caffe2/pull/1613

Reviewed By: Yangqing

Differential Revision: D6556723

Pulled By: sf-wind

fbshipit-source-id: 453a7d787a55928f2dea1ed2b99f2df011aa8d26
2017-12-21 17:47:05 -08:00
a7ac591d3b Support for DLPack in Python op
Summary: Adding support for DLPack tensors to Python op

Reviewed By: Yangqing

Differential Revision: D6577702

fbshipit-source-id: e14ef213fcdb2930ffe164667971a92aa8db503c
2017-12-21 17:02:16 -08:00
b231efbdc1 Install jupyter in all Jenkins images
Summary: Closes https://github.com/caffe2/caffe2/pull/1650

Differential Revision: D6624661

Pulled By: pietern

fbshipit-source-id: afb659181defa61a74b3dd4139495fce19691710
2017-12-21 16:38:15 -08:00
1632ab2979 Fix default device for Variable.new() (#4307)
Variable.new() should default to the device of "self" if no device is
specified. Previously, we were using the current device. This now
matches Tensor.new().
2017-12-21 18:35:35 -05:00
f5de5a84be Throw exception in checkBackend, improve standard_gamma_grad error messages. (#4306) 2017-12-21 18:21:55 -05:00
60bbccc8e7 Add factory Type::sparse_coo_tensor(indices, values) (#4303) 2017-12-21 17:08:36 -05:00
4dba674324 Move factional max pooling to ATen (#4290) 2017-12-21 17:07:46 -05:00
a076731066 Use where rather than _s_where in _s_where backwards so where is traced. (#4301) 2017-12-21 16:08:16 -05:00
41c9959ef7 Enable functional torch.where. (#4298) 2017-12-21 13:55:57 -05:00
e0ebd9a14e Check GCC version on Ubuntu
Summary:
Thanks to feldim2425 we know that GCC 5 in Ubuntu 17.04 and later
doesn't define the macro _GLIBCXX_USE_C99 and by extension the
std::to_string, std::stoi, and std::stod functions (and probably
more). Instead of avoiding using these functions, we simply recommend
people to use GCC 6 or higher on the newer Ubuntu versions where GCC 5
doesn't work.

As a side note, CUDA 8.0 is compatible with GCC up to version 5. This
implies that compiling Caffe2 with CUDA on Ubuntu >= 17.10 implies
using CUDA >= 9.0. If you need to compile with CUDA 8.0 and are on
Ubuntu, you are stuck on version 16.04 or lower.

I verified this fix by running cmake on Ubuntu 17.10 with
-DCMAKE_CXX_COMPILER=/usr/bin/g++5 and observing the fatal error.

This closes #1633.
Closes https://github.com/caffe2/caffe2/pull/1645

Differential Revision: D6620812

Pulled By: pietern

fbshipit-source-id: 29af88cad9bede4fd952084c404c85db05baa9c4
2017-12-21 10:51:50 -08:00
8af9f0da99 Saving checkpoint failure should not cause job failure
Summary:
If we encounter failures while writing a checkpoint, ensure that the job does
not fail.
A job can make progress even if writing a checkpoint fails

Reviewed By: anshulverma, boryiingsu

Differential Revision: D6615163

fbshipit-source-id: 01f790422e1a81bab1fe73f86750eaf75a72bb77
2017-12-21 10:32:55 -08:00
5f7c5502b8 Further improvements to ATen convolution (#4287)
- Rename THNN convolution to have thnn_ prefix.
- Propagate CuDNN benchmark and deterministic to at::Context
- Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults
  The conv_transposeNd wrappers are updated to have the same argument
  order as Python.
- torch.nn.functional directly dispatches to the native wrappers
- Make it possible to turn off tracing for some native wrappers, so I don't
  have to write symbolics for all the functions above
- Spectral ops can now make use of CuDNN convolution if possible
- Better commentary on cudnn_batch_norm
- Turn on DCE for all JIT tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21 13:03:43 -05:00
46054ddb5c Run MiscCheck.cmake earlier in CMake process
Summary:
This means warnings and errors fire sooner rather than later.

This requires a fix for an issue where CMAKE_REQUIRED_FLAGS propagates
to some unrelated check, which then fails, because the Android
compiler doesn't support -mavx2.
Closes https://github.com/caffe2/caffe2/pull/1646

Differential Revision: D6620129

Pulled By: pietern

fbshipit-source-id: 4d1185406ebee3a523d39811bca6783bee82c898
2017-12-21 09:17:26 -08:00
5b8fe5cbb5 Batchnorm in ATen (#4285)
* Batchnorm in ATen

This commit moves BatchNorm derivatives into ATen, eliminating
torch/csrc/autograd/functions/batch_normalization.cpp

Some refactoring along the way:

- Functions got renamed to remove _forward from their names
- CuDNN batchnorm forward was modified to return save_mean/save_std instead of
  take it as parameters. To avoid returning undefined Variables, these return
  (small) uninitialized tensors when they are not used.
- THNN batch normalization takes care of resizing save_mean and save_std on
  forward.
- There are some shenanigans re batchnorm backwards in eval mode. I'm tracking
  that in #4284
- I decided not to introduce buffers as a proper concept in ATen, which means
  that tensors like running_mean/running_var are variables in ATen.  This meant
  there needed to be some adjustments to how we *trace* such variables; the
  new strategy is if we can't find a Value for a variable, we look and see
  if we have a Value for the buffer pointed to by the variable, before
  finally falling back on constant.
- This PR finally reliably triggered OOM on Travis builds; I fixed this by reducing
  the number of parallel jobs.
- Stop using std::string when it's not necessary.
- Remove training parameter from cudnn_batch_norm_backward, because it
  doesn't make sense; cuDNN doesn't implement the math for evaluation mode
  batchnorm backwards.
- batchnorm_double_backward is now in an anonymous namespace, as it
  no longer needs to be called from torch/csrc

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21 11:38:31 -05:00
d7e6ede784 Implement Laplace distribution (#4289) 2017-12-21 17:03:03 +01:00
cf1878814f Fix typo in operator_schema.h
Reviewed By: salexspb

Differential Revision: D6609091

fbshipit-source-id: e251ec5b98aa00cb7557baa6cf8aeb731ebf78d8
2017-12-21 01:38:35 -08:00
e996157a5c Check if dlopen() return handle is NULL in open_libopencl_so()
Reviewed By: Maratyszcza

Differential Revision: D6616616

fbshipit-source-id: 36aab05ec38ca1b843b05f36433dcd90ca476122
2017-12-20 17:36:19 -08:00
32a4a523d5 cache OpenMP_FOUND in cmake (#4252) 2017-12-20 19:48:57 -05:00
54e11639f9 Fix broken test_beta_log_prob in Python 3.6 (#4261) 2017-12-20 19:41:03 -05:00
a53e04a63e Document some autograd invariants (#4272) 2017-12-20 19:40:00 -05:00
674d7d1f8e Allow compiled functions to call compiled functions. (#4286) 2017-12-20 19:02:59 -05:00
fab5885df6 Add Min and MinGradient Op in Caffe2
Summary: Add Min and MinGradient Op

Reviewed By: jamesr66a

Differential Revision: D6608668

fbshipit-source-id: 7e1f8fa7a42a94f26152da0109d597e5deeb21c0
2017-12-20 14:49:55 -08:00
b6a30f7ede Move SELU to ATen (#4269)
Fuse scale multiplication into ELU
2017-12-20 16:32:21 -05:00
dad4b2d6cc Move adaptive avg/max pool1d to ATen (#4266) 2017-12-20 15:50:17 -05:00
689ef9cba3 Move upsampling to ATen (#4264) 2017-12-20 15:12:07 -05:00
efb6feb242 Make the JIT interpreter handle unused inputs correctly 2017-12-20 11:27:40 -08:00
6daf34ce7b Don't mark index as traceable, and other improvements (#4249)
* Improve 'untraced variable' message, add failing test.
* Make index traceable.
2017-12-20 11:25:59 -08:00
a88a8ec827 Convolution derivatives in ATen (#4116)
* Convolution derivatives in ATen

This PR introduces ATen implementation of convolution, which dispatches to
THNN/CuDNN/nnpack based on input parameters. The general strategy is to compose
this function out of the various forward-backward pairs of specific
implementations, rather than write a monolithic function with backwards (which
is what we did before because the boilerplate of doing it otherwise would have
been very high.) The new API provides the following functions:

  - _convolution, which is a fully generic, native convolution implementation
    that dispatches to various other convolution implementations depending on
    input characteristics. This is prefixed with an underscore because it
    explicitly takes benchmark, deterministic and cudnn_enabled which are
    implementation details for CuDNN. The intent is to eventually provide a
    convolution that reads these parameters out of the context using #4104.
  - _convolution_nogroup is a convolution implementation for non-CuDNN
    algorithms which don't support group convolution natively.
  - _convolution_double_backward is the generic double-backwards implementation
    for convolution.

In more detail:

- Most functionality from torch/csrc/autograd/functions/convolution.cpp has been
  moved into aten/src/ATen/native/Convolution.cpp
- We continue to make use of ConvParams, but we now construct the parameters
  upon entry to a function from the function signature (which does not use
  ConvParams; having convolution take ConvParams directly would require teaching
  the code generator how to accept these as parameters, complicating ATen's API
  model) and destruct them when making subprocedure calls.
- I introduce a new idiom, input_r, which represents a const Tensor& reference,
  which will subsequently be assigned to a local Tensor input. This is helpful
  because a lot of the existing algorithms relied on being able to assign to
  locals, which is not permitted with a const reference.
- The native argument parser now supports std::array<bool,2> inputs (NB: there
  MUST NOT be a space; this is the same hack as is applied to derivatives.yaml)
- Native parser now supports Tensor? arguments, which indicates a nullable
  tensor. Previously this function was only used by NN methods.
- Documentation updates on THNN library
- I added an extra fgradInput argument to VolumetricConvolutionMM_updateOutput
  and VolumetricConvolutionMM_accGradParameters so that its buffer list lines up
  with the backward argument list. This makes it possible to write derivative
  for conv3d which previously was not supported (commented out in
  derivatives.yaml)
- Extra double_backward declarations for all convolution backwards functions was
  added.
- You can now use the syntax Tensor? in native_functions.yaml to indicate that a
  tensor argument is nullable.  There are adjustments to propagate this to the
  Python argument parser.
- NNPACK was ported to ATen, and ATen now builds and links against ATen if
  possible. New AT_NNPACK_ENABLED macro.  The nnpack functions are
  nnpack_spatial_convolution.
- Some modest CuDNN convolution refactoring to remove _forward from names.
- There's a new cudnn_convolution_backward function to deal with the fact that
  CuDNN convolution double backward requires you to have computed all gradients
  in one go.
- Variable set_flags now checks if the tensor is undefined, fixing a silent memory
  corruption.
- checkSameType updated to not raise an exception if called with Variable arguments
- "no ATen declaration found for" error message is improved to say what available declarations are
- make_variable now accepts undefined tensors, and returns an undefined tensor in this case.
2017-12-20 14:19:27 -05:00
63ac3633f5 Implement torch.where(condition, x, y) CPU Variable. (#4259)
* Implement torch.where(condition, x, y) CPU Variable.

* Get rid of IMPLEMENT_STATELESS for where.
2017-12-20 13:08:42 -05:00
456b5b1642 Implement _values() and _indices() methods for sparse variables in python (and sparse tensors in aten) (#4058)
This is a part of making sparse tensors work with dataloader (#3898)

This exposes `_values()` and `_indices()` for sparse variables in python (and sparse tensors in Aten).

To do this, I added THDenseTensor* and THDenseIndexTensor* return value functionality to Declarations.cwrap. These should always mean "the dense equivalent of THTensor*" and "the dense equivalent of THIndexTensor*" respectively.
 
cc @zdevito for the THDenseTensor in cwrap addition

### Test Plan
Run the following:
```
import torch
from torch.autograd import Variable
v = torch.FloatTensor([3, 4, 5])
i = torch.LongTensor([[0, 1, 1], [2, 0, 2]])
x = Variable(torch.sparse.FloatTensor(i, v, torch.Size([2,3])))

x._indices()
x.data._indices()

x._values()
x.data._values()
```
2017-12-20 12:25:26 -05:00
d400305eb9 fix typo in grad_mode 2017-12-20 17:13:39 +01:00
766312b7f2 Further relax VariableFlags, ... and fix bugs (#4244)
* Further relax VariableFlags

* Allow a requires_grad=True trace to be used for a requires_grad=False
  input by computing the gradient but they not connecting it to the
  input.
* Enable CSE to de-duplicate WLM backwards pass code which calls sum twice.
* Fix a bug in the interpreter that frees a register too early when
  it appears twice in a use list.

* [fuser] Follow all outputs to check if fusion is safe

This bug was introduced when we allowed fusion groups
to fuse together. Previously producers were forced to have a single
output, but now producers that are fusion groups can have multiple outputs.
So now we check the uses of all the outputs of a producer.

* [JIT] Fix handling of undefined inputs

It is not legal to call .data() on variable objects whose tensors
are undefined.
2017-12-20 10:36:22 -05:00
77ea2f26d8 Add build support for Python 2.7 using MSVC (#4226) 2017-12-20 15:07:25 +01:00
b11db95478 Fix compilation warnings (#4248) 2017-12-20 15:07:13 +01:00
0bc1505f34 Implement .entropy() methods for all distributions (#4268) 2017-12-20 14:06:01 +01:00
cf2e088c9a Translate None to zeros for old-style autograd functions (#4242) 2017-12-20 14:03:56 +01:00
1681d07199 Disable tests and fix issues with Windows CUDA build (#4251) 2017-12-20 11:30:21 +01:00
69265ea5bc Ensure gamma samples are positive (#4262) 2017-12-20 10:17:31 +01:00
257a9e5279 add hill learning rate scheduling
Summary:
hill: the learning rate changes according to following 3 stages
1) linear warmup (increasing) at first num_iter steps from start_multiplier
2) inverse shrink (decreasing) afterwards (gamma, power)
3) lower bounded by end_multiplier

Differential Revision: D6565379

fbshipit-source-id: 9c0e51fc825ba6a7765803a1f09479497057a9d9
2017-12-19 23:35:44 -08:00
4c56ce0958 Remove unused thnn/loss.py (#4267) 2017-12-19 22:40:28 -05:00
ce2a0aa4d8 Add slice and gather syntax
Summary:
Implemented syntactic sugar for the following constructs:

- `x.Gather(y)` can now be written as `x[y]`
- `x.Slice(start, end)` can now be written as `x[start:end]`

For slicing, `start` and/or `end` can be omitted iff `x` is one-dimensional (i.e. a vector). That is, `vector[start:]`, `vector[:end]` and `vector[:]` will work. Doesn't work for higher-dimensional tensors because to emit the start/end indices we need to know the rank of the tensor (since `Slice` requires one entry per dimension of the tensor).

Also added a `getProto()` function so that I could test that the generated code is as expected (i.e. that the syntactic sugar does not affect the structure of the output).

Reviewed By: zdevito

Differential Revision: D6605864

fbshipit-source-id: 786359713a13314c24be2fc07e01486c507404ef
2017-12-19 19:17:01 -08:00
b476d10c64 Move max_pool1d to ATen (#4257) 2017-12-19 20:10:11 -05:00
8c8114801b Fix onnx export of replication pad (#4263) 2017-12-19 20:07:01 -05:00
9495595520 Move reflection/replication padding to ATen (#4258) 2017-12-19 18:57:14 -05:00
97c33a22a6 GPU fallback for LengthsRangeFill Op
Summary: Simple fallback implementation to support LengthsRangeFill, we can have native CUDA implementation later

Reviewed By: pietern

Differential Revision: D6594031

fbshipit-source-id: b705234a591a61e8d1ee5f7524aceec3f4581f9c
2017-12-19 15:42:13 -08:00
c470055319 Remove template_scalar, implement is_signed using dispatch. (#4255)
This also fixes is_signed for Half, which was using the default std::is_signed,
which returns false.
2017-12-19 18:17:22 -05:00
227ef1fb60 Move adaptive avg pooling 2d/3d to ATen (#4254)
Move adaptive avg pooling 2d/3d to ATen

Also use ATen for softshrink
2017-12-19 15:45:33 -05:00
168271f1b8 add struct get method
Summary: as titled, to improve the schema usage

Differential Revision: D6565050

fbshipit-source-id: a551fb4f3089410e9cd468ee58e756de6a8ed66e
2017-12-19 12:35:56 -08:00
96007ec6c0 fix an out of bounds hypothetical (#4240) 2017-12-19 08:37:42 -05:00
bc6bd62bd6 Fix distributed dataloader so it pins memory to current GPU not GPU 0. 2017-12-19 13:39:06 +01:00
7315a19bc9 add maybe_add_global_constant
Summary:
In layer model helper, add a method `maybe_add_global_constant` to ensure
that when two global constants are added with the same name, we check if they
are actually the same (by initializer) and only add it once.

Reviewed By: kennyhorror

Differential Revision: D6537532

fbshipit-source-id: 37aa3860a2e40d81161ccdea0c50a316248be2e2
2017-12-18 22:14:00 -08:00
cb4f6c3148 conv_tbc (#3730)
attempt to rebase

skip conv_tbc in preprocess_nn_functions

Add conv_tbc symbolic

Fix backward issue with dBias

ConvTBC nn wrapper and unit test
2017-12-18 23:52:36 -05:00
019db89cb2 Fix documentation for ResizeNearest op
Summary:
Undoes fake news in `ResizeNearest` documentation.
Closes https://github.com/caffe2/caffe2/pull/1630

Reviewed By: Yangqing

Differential Revision: D6584224

Pulled By: goldsborough

fbshipit-source-id: ec5b8ffe611a042dd3031e94aff4552d01f4f5e8
2017-12-18 18:33:14 -08:00
d28720b90a Backpropagation for While op
Summary: Adds support for backprop to While op, fixes gradient computation for Pow

Reviewed By: azzolini

Differential Revision: D6456875

fbshipit-source-id: 9f660317ad6f3898ff7d8ce43098f85c3426409b
2017-12-18 16:03:45 -08:00
52600f8607 Record workflow run id for inference.
Reviewed By: salexspb

Differential Revision: D6094757

fbshipit-source-id: d8761749e8eb080f50fb08a37431e8a987d0a2db
2017-12-18 15:33:19 -08:00
1b820be7e5 Fix the ATen static building issue on CUDA
Summary:
Yangqing pietern

With https://github.com/caffe2/caffe2/pull/1627, Caffe2 can statically built with USE_ATEN=ON and USE_CUDA=OFF. But the function deleterFor defined in aten_op_template.h causes duplicated symbols in libcaffe2.a and libcaffe2_gpu.a.

I checked at only one place we call this function, so directly manually inline it into the caller. Later when we use it at other places, we can just extract it again, and put the implementation in aten_op.cc.
Closes https://github.com/caffe2/caffe2/pull/1632

Reviewed By: pietern

Differential Revision: D6594063

Pulled By: houseroad

fbshipit-source-id: 2328e2b2dce819378a9f18411c449830917e0d6a
2017-12-18 15:14:30 -08:00
9bf5e40dfa Refactor cudnn code layout / make build more robust. (#4201)
* Refactor cudnn code layout / make build more robust.

When I previously moved cuDNN into ATen, I wasn't too familiar with the
ATen native function directory layout, and so I did a number of
suboptimal things.  This commit fixes those problems.

- If NO_CUDA was set but cuDNN is installed on your system, we'd incorrectly
  assume that CUDNN was enabled, to hilarious effect.

- We now distinguish between cudnn implementation files and cudnn
  native function files.  The native files now live in ATen/native/cudnn,
  and are *unconditionally compiled*, even when we are not building with cuDNN.
  This means that we can unconditionally declare cudnn functions in yaml
  and they are always available, even if they are broken.  The cuDNN specific
  files live in 'cudnn', they are *never* installed, and they are used
  purely for implementation purposes.  I had to add stub implementations of
  all ATen functions to achieve this.

- I had written headers for at::native functions manually, but codegen
  will generate them for me automatically.  So I deleted the headers.
  That lets me get rid of some header install logic as well.

- There's a new note about ATen preprocessor philosophy.
2017-12-18 16:47:57 -05:00
94ff31f54d Implement Exponential distribution (#4234)
* add exponential distribution

* add exponential tests

* fix default val of sample_shape

* lambd->rate

* updates per review

* remove notes, keep failure_rate same in exponential test
2017-12-18 16:44:35 -05:00
e4f46905c0 Fix numpy batch matmul index calculation
Summary: hoangmit reported an ASAN test failure on D6389022. Upon further investigation, it appeared there was a logic error on calculating shapes when either the A or B matrix is being broadcasted. This path fixes that error =

Reviewed By: dzhulgakov

Differential Revision: D6580307

fbshipit-source-id: 2bcf9b76f668c42a463f2f0fdc82f544af3ae721
2017-12-18 13:20:10 -08:00
d605058212 Replace Variable.volatile with torch.no_grad() (#3970)
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().

In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()

Fixes #3627
2017-12-18 15:46:13 -05:00
0876bab8b7 Support CPU Apply in ATen and implement standard_gamma using it (#4161)
* Support CPU Apply directly in ATen and implement standard_gamma using it.

Main changes in this PR:
1) Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument
versions are supported, but more are easy to add).  In fact, this is basically identical to TH_APPLY, except
it uses ATen functions and the API is a template instead of a macro.  The template takes an operation that
is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that
x_data is a pointer to the current data location of x.

2) Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types.
This is in preference to the template_scalar path, which requires valid specialization of each scalar type.  Valid
specializations are  particularly annoying with CUDA because you most likely can't put the specializations
in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations.
Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and
dispatch_cpu_floating_types (which is used by standard_gamma).

3) Implements standard_gamma using the above changes (this is an arbitrary choice, it was the latest
apply macro to be committed).  The forward is bound via Declarations.yaml,
the backward via the Apply template, and then they are hooked together in derivatives.yaml.  This eliminates
needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros
(e.g. TH_MATH_NAME).

* Generate Dispatch code with nicer spacing.

* Small cleanups.

* Fix typo.

* Add TODOs for changing macros, remove dead code.

* Use a lambda function.

* Get rid of early exit.

* Rename Scalar,ScalarType template parameters to CScalar.

* Reorder _standard_gamma_grad parameters.

* Add comments explaining calling convention.

* Don't generate Dispatch.h anymore.

* Get rid of backend specific checks in dispatch.

* Fix empty/scalar check.
2017-12-18 15:45:01 -05:00
bcbb36e99a Allow value broadcasting in distributions.Distribution (#4210) 2017-12-18 20:11:39 +01:00
68c0998cbe added AMSgrad optimizer to Adam and SparseAdam (#4034)
* initial AMSGrad

* added test for amsgrad

* added amsgrad to adam

* fixed tests

* added option to sparse adam

* flake8
2017-12-18 13:24:49 -05:00
ee98e7a82e Implement Dirichlet and Beta distributions (#4117) 2017-12-18 19:11:37 +01:00
ccf4dc1525 Add reduce arg to BCELoss (#4231)
* Add reduce arg to BCELoss

* Fix test precision

* reduce keyword for BCELoss in derivatives.yaml
2017-12-18 12:28:53 -05:00
d8b2e5d091 Add python only default init expression; Implement stft, hann/hamming/bartlett window. (#4095)
* implement stft

* addressed comments; implemented window functions; added support for python only default initialization
2017-12-18 12:28:23 -05:00
3b641dc805 fix include order for PRId64 macro 2017-12-18 09:32:16 -05:00
54d689253e Revert "Add reduce arg to BCELoss" (#4221)
* Revert "Add reduce arg to BCELoss (#3532)"

This reverts commit 847c56aeb5857fc4d3f5df88b9e8f937939bb8cc.
2017-12-18 03:13:09 -05:00
e9ef20eab5 Add Cosine Annealing LR Scheduler (#3311)
* Add Cosine Annealing LR Scheduler

* Update eta_min in tests to prevent numerical mistakes

* Use non-zero min_eta in test_cos_anneal_lr
2017-12-18 02:43:08 -05:00
847c56aeb5 Add reduce arg to BCELoss (#3532)
* Add reduce arg to BCELoss

* Fix test precision
2017-12-18 02:39:49 -05:00
b86dc0c8ba add reduce arg to PoissonNLLLoss (#3770)
* add reduce arg to PoissonNLLLoss

* fixed comments except reference function

* fixed unit test

* small indentation fix

* fixing last comments by richard

* lint check

* another linting issue
2017-12-18 02:32:05 -05:00
02317d9336 Enable ext build for Windows (#3935)
* Enable ext build for Windows

* Include the static libs to make the compiling of the extension easier
2017-12-18 02:23:34 -05:00
390b7afd45 Fix CUDA Multinomial checks (#4009) 2017-12-18 02:20:26 -05:00
43dd6319db Exclude attrs with invalid python variable names from __dir__ (#4011) 2017-12-18 02:19:55 -05:00
5cc26c0c90 Add default PyTorch seeding and worker_init_fn to DataLoader (#4018)
* Add default PyTorch seeding and worker_init_fn to DataLoader

* generate seed using current RNG each time

* worker_seed <- main_proc_RNG_generated_seed + worker_id
2017-12-18 02:19:08 -05:00
30e6898808 Implement NLLLossNd (#4035)
* Implement NLLLossNd

* Fix tests and typos

* Fix tests
2017-12-18 02:16:16 -05:00
7f41149e14 handle requires_grad when creating buckets for distributed (#4044) 2017-12-18 02:13:53 -05:00
3796ce9255 assert (#4056) 2017-12-18 02:11:01 -05:00
e0d5d1b7c9 view in certain noncontig case (#4062) 2017-12-18 02:08:17 -05:00
9394e65b44 Add proper shape checking to torch.cat (#4087)
* Fix catArray in THTensor

Asserts that the inputs have the same size except in the
cat dimension or are empty (or a mix of both).

* Fix catArray for THCTensor

* Document torch.cat shape checks

* Fix types
2017-12-18 02:05:58 -05:00
2c71b679d2 Implement pin_memory() as a NativeFunction (#4094)
* Implement pin_memory() as a NativeFunction

This adds allocators as a concept in ATen that extends deleters. An
allocator is a subclass of at::Allocator that implements the virtual
methods:

  virtual void* allocate(size_t n);
  virutal void deallocate(void* ptr);

A tensor created with a custom allocator can be resized, unlike a tensor
with a custom deleter.

* Rename AllocatorContext to AllocatorRetainable
2017-12-18 02:03:28 -05:00
0257f5d19f improve performance of maxpooling backwards (#4106) 2017-12-18 01:55:38 -05:00
bec0349280 Implement Variable.cuda and Variable.type using ATen (#4139)
* Implement Variable.cuda using ATen

This adds an optional async flag to Tensor::copy_, which attempts to do
a non-blocking copy if the one of the tensors is in pinned memory and
the other is a CUDA tensor.

* Perform cross-device copy in CopyBackwards

Also call torch.cuda._lazy_init() from Variable.cuda()

* Implement Variable.type via ATen

* Changes from review:

 - remove copy_out
 - remove unnecessary include
 - fix default device for .cuda()

* Combine if statements in dispatch_type
2017-12-18 01:54:35 -05:00
b79d74aa81 Re-initialize autograd engine in child processes (#4158)
* Re-initialize autograd engine in child processes

The autograd engine uses threads for backwards. These don't exist after
forks and they were not being re-initialized because the
Engine::start_threads_flag was already set. This re-initializes the
engine in child processes, which will cause it to re-create threads when
backwards() is called in the child process.

Note that we only attempt to handle the common case where fork() is
called while the backwards threads are idle.

Fixes #3966

* Avoid non-async-signal-safe functions in fork handler
2017-12-18 01:51:27 -05:00
5c46427f08 Rearrange dimensions for pointwise operations for better performance. (#4174)
* Rearrange dimensions for pointwise operations for better performance.

In existing code, pointwise operations on transposed tensors process data
"column by column", resulting in poor performance.  The worse case happens when
all operands are transposed tensors.

This change tries to "un-transpose" tensors in such a case, so that memory
access patterns are as sequential as possible.

* More explanation on what rearrangeDims() does.

* Fixed a very important (and stupid) typo.
2017-12-18 01:49:52 -05:00
e2c75d3732 Make import work even if 'tools' is available in Python path
sys.path is searched from first to last, which means that if there is already
a 'tools' directory in the existing python path, we will fail to find the root
directory of PyTorch. Better to put it first.
2017-12-18 01:09:32 +01:00
a6fb960b98 Expose node scopeName to python (#4200) 2017-12-16 20:00:21 -05:00
0e804ae042 [jit.compile] add a jit_debug_info method (#4205)
This method prints a bunch of useful debug information including
the traces that have been record, their shapes, and the traced
graphs associated with them.
2017-12-16 13:26:28 -05:00
cab5921227 Improve symbolic hack a bit (#4143) 2017-12-16 18:44:26 +01:00
2e08885df8 Fix for issue #4103 (remove -march=native flag for ppc64le) (#4162) 2017-12-16 15:09:08 +01:00
d4d8698581 Fix repeat non owning (#4084) 2017-12-16 14:09:02 +01:00
8307f21bf6 Allow map_location in torch.load to be a string 2017-12-16 13:04:42 +01:00
e393a4f03c fix typo (#4206) 2017-12-15 23:43:18 -05:00
038fb70455 Remove dlopen() in get_libopencl_path()
Reviewed By: Maratyszcza

Differential Revision: D6584697

fbshipit-source-id: bdf5c6c6dc75eb0d7d46b1eba9852a9814f57373
2017-12-15 19:18:17 -08:00
1766e27324 Add DepthwiseConv in iOS11+
Summary: Use MPSCNNDepthwiseConv when groups == input_channels

Reviewed By: ajtulloch

Differential Revision: D6541561

fbshipit-source-id: 7164f26b8f3a101c0ab5c3e6c02ed855397d2750
2017-12-15 16:47:36 -08:00
0a25926f4b CUDA implementation for GatherPadddingOp
Summary: AT

Reviewed By: enosair

Differential Revision: D6561996

fbshipit-source-id: ad03d6db8d4318e426ff96569bb3c93cba696926
2017-12-15 16:05:31 -08:00
db0a2ff4eb selu op
Summary: selu operator for cuda

Reviewed By: prigoyal

Differential Revision: D5703418

fbshipit-source-id: 06b16a30fe1c67c1d45505e2f5cffc6408674ef3
2017-12-15 15:38:44 -08:00
95b3c7edad Fix undefined behavior in GLFilter
Summary: Ran into some issues where these values seemed to be initialized to 0 and caused some trouble. Initializing to 1 is safe and well defined.

Reviewed By: hlu1

Differential Revision: D6582774

fbshipit-source-id: 088ec4e782d9680a1d9b4d2d42523d06cbc7dd72
2017-12-15 15:38:44 -08:00
c813ce3787 Implement Variable._sparse_mask (#4124)
* Implement Variable._sparse_mask

* Use SparseTensor as the dyanmic_type
2017-12-15 17:25:20 -05:00
5fc4b66cc4 Fix timing issue in stats_test.cc
Summary:
Failure in https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-mkl-ubuntu16.04-test/135/
Closes https://github.com/caffe2/caffe2/pull/1629

Reviewed By: azzolini

Differential Revision: D6581743

Pulled By: pietern

fbshipit-source-id: 8c84f8c959015d7717785ee3f37b93c4ef146f96
2017-12-15 13:18:27 -08:00
7a5200b450 print exception in layers
Summary: as desc

Reviewed By: chocjy

Differential Revision: D6577301

fbshipit-source-id: 3c2d08a05f6fd1d6771019347e6dec4dd711a653
2017-12-15 12:12:28 -08:00
9331ecad3f Fix for installation of exported targets
Summary:
cc houseroad pietern
Closes https://github.com/caffe2/caffe2/pull/1627

Differential Revision: D6579710

Pulled By: Yangqing

fbshipit-source-id: d6457585a436c2a93d0133c491dec607cae0db7f
2017-12-15 12:12:27 -08:00
6d72c82985 Trace ATen native functions as themselves, not their implementations. (#4127)
* Trace ATen non-primitive functions as themselves, not their implementations.

Previously, if I invoked an ATen non-primitive function foo, which in turn
called subfoo, I would always see 'subfoo' in the trace (e.g., tracing
'inlines' all of these operations.)  Such inlining is bad for ONNX
(and can be bad for optimization) as it prevents high-level
optimizations from taking advantage of the structure.  It might
be right to inline, but give the optimizer a chance to work before
inlining happens!

The implementation here is surprisingly simple, because it uses
the "DCE trick".  Essentially, it doesn't matter if the constituent
calls perform tracing, because you can always trace it again, and
override the trace nodes associated with the returned variables.
The original trace becomes dead and can be DCE'd.

While implementing this, I also refactored how 'isTracing' and
'trace_outputs' works:

- isTracing was previously a single function with overloads for
  both Tensor and Variable arguments.  Unfortunately, such overloads
  are not safe, because of how C++ implicit conversions work.  You
  would think that C++ should never confuse an overload for
  Variable with ArrayRef<Tensor>, but this is exactly what can
  happen: Tensor is convertible to both Variable and ArrayRef<Tensor>,
  thus it's ambiguous and C++ doesn't like it.  The last time I ran
  into this problem, I applied initializer lists to everything and
  called it a day.  A more robust fix is to separate out the
  Variable and Tensor overloads, which I have done in this patch.

- trace_outputs was fed as an initializer list, which doesn't work
  when you have heterogenous inputs.  So instead we first feed
  everything through 'flatten', which has overloads for each of the
  argument patterns in ATen, which then goes on to the recordTrace
  (which takes an ArrayRef).  This is *no less efficient*, because
  we were allocating a vector anyway (to do the conversion from
  vector of Tensor to vector of Variable).

This fixes mean that 'index' can properly be traced... although the
JIT still does not support it.  A failing test case has been added to
this effect.

Some knock-on effects:

- The fuser now knows about chunk as well as split.  They're pretty
  similar so there is no problem.

- There is a new 'canonicalize' pass in the JIT which renumbers a graph
  so that all structurally equivalent graphs render the same.

- We run DCE before the fuser tests, to make sure dead nodes don't
  block fusion.

- There are new ONNX exports for the newly introduced higher level ATen
  operations.  This includes type_as (no-op case only), chunk, select.

Zach didn't like the extra use of 'native' in the new codegen, so
we've introduced a new concept, 'abstract'.  An abstract function
is one that is implemented in derived types (e.g., CPUDoubleType),
where as a concrete one is implemented in the base type (Type).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-15 13:50:32 -05:00
31c3766d5a Use Jenkins build status badge
Summary: Closes https://github.com/caffe2/caffe2/pull/1628

Differential Revision: D6579677

Pulled By: pietern

fbshipit-source-id: c53c0be06a342b12d1b1fcb35297ab9e792d4782
2017-12-15 10:23:34 -08:00
93c2f81f32 Fix another leak in pybind11 code. (#4185)
This time caused by an upstream pybind11 bug:

https://github.com/pybind/pybind11/pull/1216

This changes causes the code to go down a non-buggy pathway.
2017-12-15 12:57:49 -05:00
84b7daadb2 Relax verify of VariableFlags (#4191)
* Fix another leak in pybind11 code.

This time caused by an upstream pybind11 bug:

https://github.com/pybind/pybind11/pull/1216

This changes causes the code to go down a non-buggy pathway.

* Relax verify of VariableFlags

If we trace with a defined tensor, but see a run with a undefined
tensors we now allow that run to happen, replacing the tensor with
zeros.

This also fixes a bug where stage 0 tensors were not
checked against their verify flags.

This change does _not_ handle all bad situations that can happen.
For instance if the first thing traced has a undefined tensor but
a later tensor is defined, then it will fail because the graph itself
does not contain the trace for the derivative of the tensor.
However it is possible to work around this later case by
dry-running the function:

   z = Variable(...,requires_grad=True)
   x,y = f(z)
   (x.sum() + y.sum()).backward()
2017-12-15 12:57:31 -05:00
fc8ad6fde6 improve svd doc (#4155) 2017-12-15 12:57:14 -05:00
6552ea110f Make timing based test more likely to pass
Summary:
This assumed that the expect statement would run within 1us, whereas
we only care it runs in less than the 100ms to check that it got reset.
Closes https://github.com/caffe2/caffe2/pull/1606

Reviewed By: Yangqing

Differential Revision: D6572951

Pulled By: pietern

fbshipit-source-id: fd0c2854bc6459c8bf0e17fa75035eb0a4e522cd
2017-12-15 09:48:44 -08:00
dde10e1d4b Add docs talking about how to adding symbolic for unsupported ops (#3741) 2017-12-15 09:37:09 -05:00
7874f611a5 Allowing usage of GPU Direct within PyTorch for the Broadcast operation (#4183) 2017-12-15 09:35:02 -05:00
5a264b4c0c Add cublas batched gemm support. (#4151)
* Add cublas batched gemm.

* Comment cleanup batched gemm.

* Fix cuda versioning batched gemm.
2017-12-15 09:29:11 -05:00
fac711c238 Provide full support for distribution shapes (#4193) 2017-12-15 12:41:08 +01:00
db446d69ca Fix issues with Windows 7 & 10 CPU build (#4065) 2017-12-15 10:14:43 +01:00
28ea5ac069 Refactor Reduce{Front,Back}{Sum,Mean} Operators
Summary: Currently these operators are implemented in a complex meta-programming fashion, I removed the definitions and put modified CPU/CUDA implementions into reduction_front_back_ops.{cc,cu}. This will help future extension of these ops to support lengths input.

Reviewed By: asaadaldien

Differential Revision: D6506568

fbshipit-source-id: 7323baf7c8e0eca37912f3ae28c02e37ad2e1103
2017-12-14 20:02:36 -08:00
595c6dea71 Create an ONNX ATen exporting mode (#3489) 2017-12-14 22:36:53 -05:00
def4b78b6f adding index_select to symbolic.py (#4061) 2017-12-14 22:33:53 -05:00
00fe088659 Enable OpenMP in fuser (#4042)
Because it is hard to know whether -fopenmp will work on a user's machine,
we just try it, and then disable it if it doesn't work.

Fused kernels are now competitive with the stuff in TH when the kernel
is flops bound, and faster when the original kernel was memory bound.
2017-12-14 22:26:56 -05:00
d8d82d14cf Add an option to suppress download progress (#4135)
* Add an option to suppress download progress

* Add a disable option to pbar to make it a no-op

* Document progress
2017-12-14 22:19:27 -05:00
be1ef5e4a4 Added explicit tuple element-count to doc for Conv1d. (#4136)
* Added explicit tuple element-count to doc for Conv1d.
2017-12-14 22:17:46 -05:00
d8c5f2ae21 Fix a bug where from_dlpack failes if cuda is not initialized. (#4182) 2017-12-14 21:54:36 -05:00
9792acb4e0 Preprocess both inplace and non-inplace nn functions (#4184) 2017-12-14 21:51:55 -05:00
d4db1b90a1 Resuppress adagrad health checks
Summary:
Commit 479e4ce5 didn't end up solving the health checks firing and
they are likely still caused by the remaining `assume` calls.
Closes https://github.com/caffe2/caffe2/pull/1625

Differential Revision: D6573036

Pulled By: pietern

fbshipit-source-id: eeb21bdd61dca0a632eb1ba9e529177ac2569bfd
2017-12-14 16:34:41 -08:00
7f25fff2fe add reparameterization, combine sample and sample_n (#4142) 2017-12-15 00:25:39 +01:00
19c511b42f Make docker image built on Jenkins usable out of the box
Summary:
The install prefix we use in our builds is /usr/local/caffe2. This is
not standard, so in order to load caffe2 from Python, the Python
interpreter must know where to find it. In a post-build section in the
Jenkins build script we know add a symlink to Python's dist-packages
directory and instruct the loader to look in /usr/local/caffe2/lib.
Together, these tricks make it usable out of the box.
Closes https://github.com/caffe2/caffe2/pull/1617

Differential Revision: D6572322

Pulled By: pietern

fbshipit-source-id: c37b789a0d0babbb1110f991318c6b75fe351c0e
2017-12-14 15:05:52 -08:00
50360aa00a Install cmake3 (v3.6.3) in CentOS containers
Summary: Closes https://github.com/caffe2/caffe2/pull/1624

Differential Revision: D6572260

Pulled By: pietern

fbshipit-source-id: 5698d78f851108826ae68a6a41d81ee16453f666
2017-12-14 15:05:50 -08:00
c6381c6d44 Add function to explicitly initialize PyTorch CUDA state. (#4180)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-14 17:48:05 -05:00
931f5e66d9 Change MakePadding function to be private
Summary: att

Reviewed By: asaadaldien

Differential Revision: D6570051

fbshipit-source-id: 9c5eb2c1cb87c32dd19a9f096e68d521e690cf39
2017-12-14 14:47:57 -08:00
0867120f1e SortedSegmentMean/SortedSegmentLogMeanExp Gradients CUDA implementation.
Summary: AT

Reviewed By: enosair

Differential Revision: D6525541

fbshipit-source-id: dc095af1c3485d029f7744aadb66f8c51acf8ffe
2017-12-14 13:05:19 -08:00
d38a9bb4ec Fix dot processor with only one sparse feature and no dense feature
Summary:
As titled.

This will fail with the message: File "/mnt/xarfuse/uid-30088/f8742a88-seed-a26ddfbc-49aa-4c5f-9e08-91909f4775da-ns-4026532692/caffe2/python/layers/concat.py", line 52, in __init__
    "Concat expects that limited dimensions of the input tensor"

This is because the output scalar of the pairwise_dot_product layer won't contain shape information if output_dim is 1.
https://fburl.com/1m9r3ayp

This diff is fix it.

Reviewed By: xianjiec

Differential Revision: D6565930

fbshipit-source-id: 181181232065ef3fdfc825aa25d2714affbe6b8d
2017-12-14 13:05:17 -08:00
eed95f8660 Simple fix for windows
Summary:
TSIA
Closes https://github.com/caffe2/caffe2/pull/1620

Reviewed By: pietern

Differential Revision: D6566180

Pulled By: Yangqing

fbshipit-source-id: 904e8f43831fc2a4c1f7c475d1f839ab4b7d250c
2017-12-14 12:32:24 -08:00
54f6b18168 Caffe2: Make SimpleNet simple again
Summary:
There is a lot of bussiness logic around various events in
the base net class. SimpleNet doesn't have to handle those (checked
with ilia-cher). Normally these should be no events registered for
simple nets, but we can have some issues where they will be added, so
its less error prone to just have a SimpleNet::Run pure. And then we
also avoid extra virtual calls / empty vector iterations.

Reviewed By: ilia-cher

Differential Revision: D6551440

fbshipit-source-id: c97a732a00bb36eed49d35e727156ce94225a08b
2017-12-14 11:20:20 -08:00
ca44c16e72 LayerConfigMILSTMCell
Summary: A version of MILSTMCell which uses layer normalization (see https://arxiv.org/pdf/1607.06450.pdf). There's a lot of copypasta because we don't want to make the existing RNNCell classes harder to approach / understand by adding new options.

Differential Revision: D6564208

fbshipit-source-id: 0bc43e12b6c08ebdf5ea6af2c631f785c302bdb4
2017-12-14 10:17:53 -08:00
f19ae690c3 Update observer when attached to RNN ops
Summary: Observer passed to RNN step net cloned with RecurrentOperator as subject instead of internal Operator.  This diff applies adds the internal operator as the subject

Reviewed By: enosair

Differential Revision: D6560996

fbshipit-source-id: 7af4fb0ff8c19795b5c994c5fc6876f3d2ba7bf4
2017-12-14 10:04:20 -08:00
d450895a74 fix typo (#4175) 2017-12-14 12:31:58 -05:00
787b9c5202 Propagate CuDNN enabled to ATen library. (#4104)
This is not currently used by anything, but eventually ATen
will need to make decisions about whether or not to use
CuDNN functions or not, which means we need to propagate
this variable to ATen.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-14 11:29:25 -05:00
dac5e6568d Better error messages for blas ops with cuda.LongTensor (#4160)
* Better error messages for blas ops with cuda.LongTensor

Fixes #4157

Test plan

Try matrix multiplying with cuda.LongTensors

>>> import torch
>>> x = torch.randn(4, 4).long().cuda()
>>> y = torch.randn(4, 4).long().cuda()
>>> x.mm(y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: addmm for CUDA tensors only supports floating-point types. Try converting the tensors with .flo
at() at /private/home/rzou/pytorch/pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:381
2017-12-14 11:28:59 -05:00
16b7f3a35d Clean up InputBuffer 2017-12-14 15:14:35 +01:00
78ea42e37c Accept sparse tensors of corresponding type in VariableType casts 2017-12-14 15:14:35 +01:00
4f4e0df68f Allow for broadcasting of distribution parameters (#4140) 2017-12-14 09:37:03 +01:00
1eae0ac8b1 Update instancenorm.py (#4171) 2017-12-14 03:20:37 -05:00
991f03fbfb Fix memory leak in JIT
THPVariable_Wrap creates a new PyObject with refcount 1.
py::reinterpret_borrow<py::object>() would then bump it to 2,
causing it to leak.
2017-12-14 09:07:09 +01:00
3de8661184 Disable SDT calls for all nets by default
Summary:
We see a non trivial overhead because of this debugging
code. I talked with Romain and looks like we can comment this out for
now. We will think about better way to integrate this kind of
functionality in Caffe2 going forward

Reviewed By: romain-intel, pietern

Differential Revision: D6551108

fbshipit-source-id: efa3e643b953d33dc5f3d11f88cafdf2730bc4e4
2017-12-13 21:33:08 -08:00
ac2e368cb2 Fix aten header inclusion
Summary:
cc houseroad
Closes https://github.com/caffe2/caffe2/pull/1618

Reviewed By: bddppq

Differential Revision: D6563193

Pulled By: Yangqing

fbshipit-source-id: e5bc9d9e798599e96dc739a3d7d4561d5e31d4ba
2017-12-13 18:34:18 -08:00
3842128ce1 Fix gpu test for FCTransposed
Summary: Fix gpu test for FCTransposed.

Reviewed By: pietern

Differential Revision: D6560213

fbshipit-source-id: 3b5a3e2f1f2f1c144599967d3565d71dc4340cec
2017-12-13 15:48:18 -08:00
8199edf5c1 Refactor generation of NN derivatives (#4096)
Derivatives for NN functions now have to be specified in tools/autograd/derivatives.yaml. Leaving a function out will result in that function not being available in autograd.

Note that _backward declarations used in derivatives.yaml are auto-generated by aten/src/ATen/nn_parse.py so the content of tools/autograd/derivatives.yaml has to reflect the generated declarations.
This is an inconvenience, although it's smaller than it looks: future kernels will be implemented directly as ATen native functions.

As a help to the user, we could eventually save declarations generated in nn_parse.py to a file.

* Avoid automatic generation of NN derivatives

* Add inplace functions

* Refactor nn preprocessing function

* Use output instead of self in inplace derivatives

* Include grid_sampler in derivatives

* Finish fixing grid_sampler and affine_grid_generator

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Factor out setting up derivatives, use the same logic for NN and non-NN codepaths
2017-12-13 17:25:09 -05:00
8a254a0271 Port batchnorm_double_backward to ATen.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-13 17:19:47 -05:00
d41b6c7daa Implement remaining random methods through ATen (#4137)
* Implement remaining random methods through ATen

* Change test_bernoulli on Tensor to avoid broadcasting

The new ATen-dispatched bernoulli_ supports broadcasting. The old
Tensor.bernoulli_ bindings instead require the tensors to have the same
number of elements. I haven't change the old code because it will be
deleted soon.
2017-12-13 15:40:34 -05:00
dbbfdee4c0 Implement FCTransposed gradient
Summary: Add FCTranposed gradient implementation

Reviewed By: salexspb

Differential Revision: D6551998

fbshipit-source-id: 0ee8ac7df8c33e55d715bfe65d58bb9bbe1afa50
2017-12-13 11:33:07 -08:00
28890b2046 Add rnn args check (#3925)
* Add rnn args check

* Check both hidden sizes for LSTM

* RNN args check test
2017-12-13 12:48:00 -05:00
0ab68b8db4 Implement .enumerate_support() for Bernoulli, Categorical distributions (#4129) 2017-12-13 13:01:05 +01:00
6db9f6dc78 Enable half communication for distributed (#4091) 2017-12-13 13:00:12 +01:00
c16a21b67d removed the device_type assumption in adagrad_test
Summary: the "assume" statement in adagrad_test leads to health check failure. here we remove it by checking dc == hu.gpu_do

Reviewed By: pietern

Differential Revision: D6513314

fbshipit-source-id: 4caf2d938e5f5935a95cca8abd99185182223d63
2017-12-13 03:35:51 -08:00
98143776f5 SortedSegmentMean /LogExp Reduction CUDA implementation.
Summary: As titled.

Differential Revision: D6506412

fbshipit-source-id: 69f5a4f89f56a5b90905112a59fa3e99e51b46bb
2017-12-12 23:42:33 -08:00
54342287fe Look for NCCL in CUDA_TOOLKIT_ROOT_DIR
Summary: Closes https://github.com/caffe2/caffe2/pull/1611

Reviewed By: dzhulgakov

Differential Revision: D6550168

Pulled By: pietern

fbshipit-source-id: e034ce4057d37bfc8b53949c56cbcb701ea5d958
2017-12-12 21:50:49 -08:00
0000766566 Gan for ranking alternate learning rate
Summary:
This enables two learning rate for Generator and Discrimintor in GAN. For each iteration i, it will decide
whether to enable training on G (or D) based on the desired active_period and inactive_period for G (or D).

Reviewed By: dragonxlwang

Differential Revision: D6379325

fbshipit-source-id: 926f1041e25f48791b2ac1fc1a8eaa08db9639b8
2017-12-12 16:06:28 -08:00
34566f004d Adding Ubuntu Anaconda environments
Summary: Closes https://github.com/caffe2/caffe2/pull/1603

Reviewed By: pietern

Differential Revision: D6546192

Pulled By: pjh5

fbshipit-source-id: 8a61139068edd591489fc5b4b3aef2a89a2a35f8
2017-12-12 12:18:35 -08:00
ae60ef12fa Module syntax sugar.
Summary:
Adds modules:

a = Module() # create a module
a.b = 3 # set tensors in module
a.c = 4
b = my_func(a) # pass a module to a function as an argument
c = b.what + 1 # and receive a module as a return
global foo
foo.a.b # translates to Caffe2 name foo/a/b

This should help clean up beam search where many external nets are grouped
into modules.

Reviewed By: jamesr66a

Differential Revision: D6543292

fbshipit-source-id: 349eae0b1609efab4557f94650938e1fa543579d
2017-12-12 12:07:57 -08:00
3b99bb5dd1 Add readme for docker/jenkins directory
Summary:
This also removes the `bin/{build.sh,test.sh}` scripts that are now
located in `.jenkins/{build.sh,test.sh}`. The rationale for this is
that these scripts don't care about Docker specifically and are also
run for, for example, macOS builds.
Closes https://github.com/caffe2/caffe2/pull/1610

Differential Revision: D6546204

Pulled By: pietern

fbshipit-source-id: 643bfb0c342b1719c0fb51e4e0987b2674e6424f
2017-12-12 10:45:41 -08:00
8d358a1db5 allow cudnn for fp16 batch norm (#4021) 2017-12-12 12:49:26 -05:00
ba93c031f2 Moving distribution classes into a separate package 2017-12-12 02:44:44 -08:00
790933b430 Build Redis support on Linux
Summary:
Builds can then execute rendezvous where a shared file system is not available.
Closes https://github.com/caffe2/caffe2/pull/1530

Differential Revision: D6543267

Pulled By: pietern

fbshipit-source-id: a924e2d8c26e0e30e95673ca17c7e1f40f43b3dc
2017-12-11 22:33:05 -08:00
77352fdbdd Remove scoping assertion because it is not useful and causing errors
Summary: Remove scoping assrtion because it is not useful and causing errors

Reviewed By: salexspb

Differential Revision: D6538219

fbshipit-source-id: e587e294d4beec1370e6895af9354f0818a4cdd8
2017-12-11 18:03:45 -08:00
234591a809 Support regression with output transform in MTML for feed
Summary: changes on metrics and mtml.

Differential Revision: D6457175

fbshipit-source-id: 1a162c519191f290e8e919cc7fe978f502ec2840
2017-12-11 17:20:20 -08:00
90f06860b1 Update scripts in .jenkins
Summary:
Part of a 2-step process to move the Jenkins entry point scripts from
`docker/jenkins/bin` to `.jenkins`.
Closes https://github.com/caffe2/caffe2/pull/1605

Differential Revision: D6537959

Pulled By: pietern

fbshipit-source-id: 716b2e6bd50bbfe56b0bb844dd6b0c666a52527c
2017-12-11 14:35:26 -08:00
d84033dc6b Add placeholders for issues/pull requests
Summary: Closes https://github.com/caffe2/caffe2/pull/1604

Differential Revision: D6537325

Pulled By: pietern

fbshipit-source-id: 90dae8389e318ff36c8455a0e002c8e42167aa9a
2017-12-11 14:35:25 -08:00
2d07360938 Fix compilation on GCC 7
Summary:
Thanks to BrettRyland for the initial fix in #805.
Closes https://github.com/caffe2/caffe2/pull/1602

Reviewed By: Yangqing, asaadaldien

Differential Revision: D6534431

Pulled By: pietern

fbshipit-source-id: 1a3ecb77743e7cee76b61c516332137c07331067
2017-12-11 13:32:30 -08:00
53f9a0f03d Ipython notebook directory name is changed, Change from ipython to jupyter, Also pass arguments instead of fixing --ip
Summary:
Change the directory name for ipython notebook.
Change the executable name fro ipython to jupyter
Pass arguments to the script to the notebook, instead of fixing --ip='*'. In some setup, --ip='*' cause jupyter notebook not displayed.
Closes https://github.com/caffe2/caffe2/pull/1546

Reviewed By: pietern

Differential Revision: D6460324

Pulled By: sf-wind

fbshipit-source-id: f73d7be96525e2ab97f3d0e7fcb4b1557934f873
2017-12-11 13:05:40 -08:00
e78d0e5a23 Update SingleThreadAsyncNet
Summary: Updated SingleThreadAsyncNet to use new interface

Reviewed By: ajtulloch

Differential Revision: D6526515

fbshipit-source-id: 6aa24678ba7350a5e448e9c2ab29ccd07a1fcb0b
2017-12-11 13:05:39 -08:00
aeb7a3668d Implement Variable.new (#4080) 2017-12-11 15:45:43 -05:00
eb3292bbf2 Add builder for CentOS Docker images
Summary:
cc pjh5 yangqing
Closes https://github.com/caffe2/caffe2/pull/1598

Reviewed By: pjh5

Differential Revision: D6535868

Pulled By: pietern

fbshipit-source-id: 2b43b7b334422a485b45b1c51051f7e8cd2bd5b2
2017-12-11 12:04:30 -08:00
8612b0bbd8 Fix 'Undefined symbols _THDoubleTensor_digamma_one'
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-11 14:38:49 -05:00
aed38c96bb Don't set -fno-openmp for Clang.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-11 14:38:49 -05:00
05ebd21a36 Implement reparameterized gradient for Gamma sampler (#3978) 2017-12-11 03:32:15 -08:00
77dfdbf96c Ensure RNNCell variants don't broadcast (#4074)
* Ensure RNNCell variants don't broadcast

* Fix lint

* Add test for hidden_size=1 in RNNCell no broadcasting test

* Prevent broadcasting for hidden_size and input_size

* Isolate input checking from hidden size checking
2017-12-11 03:00:54 -08:00
fca617c62f Suppress hypothesis health check in adagrad_test.py
Summary:
PR #1536 suppressed test_sparse_adagrad but test_row_wise_sparse_adagrad also filters too many examples. Suppress health checks for this test as well.
Closes https://github.com/caffe2/caffe2/pull/1599

Differential Revision: D6530850

Pulled By: pietern

fbshipit-source-id: c73f30d2e104565421e3e381b1cf66185edc833e
2017-12-10 11:47:15 -08:00
93009b15e8 Compute flops in conv based on output image size
Summary:
Flops in conv were underestimated when pad is not zero.
The difference is especially big when image is small.

Reviewed By: salexspb

Differential Revision: D6394190

fbshipit-source-id: b9f057fceae77f745c5daa668cb2100f993d21a7
2017-12-09 21:32:08 -08:00
b886498f62 Don't use CMake generator expression for in-tree protoc build
Summary:
This fixes the in-tree protoc build on CentOS 7 (that ships with super old protobuf version).
Closes https://github.com/caffe2/caffe2/pull/1595

Differential Revision: D6529307

Pulled By: pietern

fbshipit-source-id: ac81c7cd884846854b4ffd4909377e87d93bddc3
2017-12-09 13:33:30 -08:00
a8b8614efa Fix typo
Summary: Closes https://github.com/caffe2/caffe2/pull/1596

Reviewed By: zdevito

Differential Revision: D6528030

Pulled By: jamesr66a

fbshipit-source-id: feedf272cd6583360e5e20e90de3f02b728566e6
2017-12-08 20:49:36 -08:00
c902f1cf98 Allow specification of bool defaults in native functions. (#4089) 2017-12-08 15:26:08 -08:00
53b12693ff Run NCCL tests in CUDA environments
Summary:
cc slayton58
Closes https://github.com/caffe2/caffe2/pull/1594

Differential Revision: D6525109

Pulled By: pietern

fbshipit-source-id: c95d0615849e5a2014d228ed14bc81b5c827084f
2017-12-08 15:17:18 -08:00
a24c11329a Fix out-of-place allocations
Summary:
Also add int as a datatype and correctly check error codes on group
start, end
Closes https://github.com/caffe2/caffe2/pull/1590

Differential Revision: D6524086

Pulled By: pietern

fbshipit-source-id: 385aab6fe1bbf6b5c06fa905066bc576a733c856
2017-12-08 15:03:49 -08:00
83162b2af1 Expose resize_ and resize_as_ to Python (#4088)
We'll need these functions when we merge Variable and Tensor. They throw
an exception if called on a Variable that requires grad. As of now,
every Variable that has a grad_fn also requires grad.
2017-12-08 16:40:41 -05:00
f8ae4b6670 avoid auto's in the lambdas in OSS build
Summary: no

Reviewed By: pietern

Differential Revision: D6521515

fbshipit-source-id: 74049ad63fcf2e854ebeeac150c2ba2017904b7a
2017-12-08 12:05:02 -08:00
1b25dfd204 Parameterize jenkins user uid/gid in docker build
Summary: Closes https://github.com/caffe2/caffe2/pull/1587

Differential Revision: D6521818

Pulled By: pietern

fbshipit-source-id: 6336e6af917b9f71abb63dd4b78009dad26890ca
2017-12-08 12:05:00 -08:00
75c11d62b7 Implement Variable.__invert__ (#4082) 2017-12-08 13:05:51 -05:00
fd2cab9ded Rudimentary schema checking of operators
Summary:
Uses caffe2 operator schema to check # of inputs/outputs.
Falls back to actual schema->Verify so that schema errors get
reported associated with a SourceRange.

Reviewed By: jamesr66a

Differential Revision: D6517136

fbshipit-source-id: 9be89165ea5e717c4cec1d25bbd967df86200d6c
2017-12-07 23:44:28 -08:00
1c6595c8e8 Add function calls and externs
Summary:
Adds the ability for a script function to call another and adds the extern function to register an external Caffe2 Net that can be called by the script.
Closes https://github.com/caffe2/caffe2/pull/1591

Reviewed By: jamesr66a

Differential Revision: D6515877

Pulled By: zdevito

fbshipit-source-id: b893d9e4bacd7389b550ac8a37ad7974b95de749
2017-12-07 23:44:28 -08:00
0f8bdf61e6 add option to config engine for benchmark binaries
Summary: no

Reviewed By: ajtulloch

Differential Revision: D6304644

fbshipit-source-id: 5a699c93bef72db12fc5ad60b1d7d4d35c042b7d
2017-12-07 18:03:39 -08:00
8d8079a7c3 Builtins for {zeros,ones}{,_like}
Summary: Closes https://github.com/caffe2/caffe2/pull/1589

Reviewed By: zdevito

Differential Revision: D6511699

Pulled By: jamesr66a

fbshipit-source-id: d12421a13fec0c2d4f4fe0dc27b0f8a7b93b7c16
2017-12-07 15:48:52 -08:00
1fd1eaa119 More complete beam search example w/ init code
Summary: Closes https://github.com/caffe2/caffe2/pull/1588

Reviewed By: zdevito

Differential Revision: D6511524

Pulled By: jamesr66a

fbshipit-source-id: eb19e74918a3f3a4f5e8a1ed68762e5e5c346160
2017-12-07 15:48:51 -08:00
71ab6f41ed Post EOS penalty example
Summary: Closes https://github.com/caffe2/caffe2/pull/1586

Reviewed By: zdevito

Differential Revision: D6505863

Pulled By: jamesr66a

fbshipit-source-id: 2778081de32fcf134df7083ab8fa739ec41fd182
2017-12-07 14:47:38 -08:00
5c809de4b4 Add missing derivatives.yaml input 2017-12-07 14:46:43 -08:00
1c96809cf8 Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable. (#3945)
* Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable.

Also changes the error messages around Generator parser; previously, you'd get an error
like: torch._C.Generator is not a torch.Generator; now the check is proper but returns
that only None is supported.

* Support passing Generators to ATen Variable-bound methods.

This involves changing THPGenerator to have an at::Generator rather than a THGenerator.
TH getRNGState, setRNGState are still called directly because they are not bound from ATen yet;
they should probably be on the Generators and return (opaque) GenerateState objects.

* Fix default values.

* Properly use THRandom_initialSeed.

* update standard gamma to use new default generator.
2017-12-07 14:34:51 -08:00
9ea576d068 Implement neg for all types (#4075)
The C/C++ unary negation operator is well defined for unsigned types. We
should use that behavior. This also implements neg for CharTensor. That
behavior currently depends on whether char is signed or unsigned.

Fixes #4066, #3225
2017-12-07 16:37:17 -05:00
60c03bc09c Implement apply_, map_, and map2_ in Variable (#4057) 2017-12-07 14:48:56 -05:00
f233a3ebd8 Explicitly set default data type in seq2seq/translate.py
Summary: word_rewards data type is mixed; ConstantFill assigns long but later is filled with float32.  This causes issues when running net from outputted protobuf.  This change makes data type to be float32 for lifetime of blob.

Reviewed By: jhcross

Differential Revision: D6486723

fbshipit-source-id: c4ce5185a0a6d71b08b1819f2355e9354823b701
2017-12-07 11:21:01 -08:00
ea11c30df6 throw new -> throw (#4059) 2017-12-07 09:20:10 -08:00
fc4d976a8a Fix non-determinism in code generation scripts (#4063) 2017-12-07 09:18:06 -08:00
dc47319074 Implement AssertOp
Summary:
This can be used for testing and debugging. zdevito and I will primarily use this for our caffe2 script project
Closes https://github.com/caffe2/caffe2/pull/1585

Reviewed By: zdevito

Differential Revision: D6501209

Pulled By: jamesr66a

fbshipit-source-id: fdd65e422c44b74bb6926320af506dcae13327f3
2017-12-06 17:18:52 -08:00
a6ff78457f Misc. fixes and improvements
Summary:
* condition if
* True/False literals
* and, or, not
* 0-output expressions, like print
* _ is given a fresh name
* x.foo(...) is desugared to foo(x,...)
* +=, *=
Closes https://github.com/caffe2/caffe2/pull/1581

Reviewed By: jamesr66a

Differential Revision: D6495256

Pulled By: zdevito

fbshipit-source-id: b601d3f9e08fa544881a0c946b4feac24cb7e116
2017-12-06 17:03:33 -08:00
098ab27013 Update beam search example to use new features
Summary:
Code looks much nicer after improvements introduced in https://github.com/caffe2/caffe2/pull/1581
Closes https://github.com/caffe2/caffe2/pull/1582

Reviewed By: zdevito

Differential Revision: D6497976

Pulled By: jamesr66a

fbshipit-source-id: 529278a104c0be81aa999a414d89c2f2e0264324
2017-12-06 15:02:43 -08:00
0365640d7e Fix ConvTranspose
Summary: Turns out that similar to RoIWarp, col2im in custom ConvTranspose implementation is also missing a bound check for image.

Reviewed By: ajtulloch

Differential Revision: D6494061

fbshipit-source-id: 1fadbdd05f360b20343df49b70d2be65eab128ac
2017-12-06 12:20:57 -08:00
d0cabbde74 Implement Variable.from_numpy (#4043)
Implements from_numpy using ATen tensors. Variable.from_numpy is a
convenient placeholder for the variant that returns Variables until we
merge Tensor and Variable.

The behavior is slightly changed:

 - from_numpy() on an empty array now returns an empty tensor instead of
   throwing an exception. The shape may not be preserved.
 - CharTensor(ndarray) used to throw an exception. It now copies the
   ndarray. Copying is implemented via ATen toType.
2017-12-06 14:08:56 -05:00
3c1932c35f Fix RoIWarp
Summary: Fix MPSCNNRoIWarp and made it more general to channels

Reviewed By: ajtulloch

Differential Revision: D6493869

fbshipit-source-id: 77cfa2e2f3bd80efc6e69a0774793e0162d9942a
2017-12-06 11:02:07 -08:00
38f13447bc Implement Variable.tolist() (#4038)
Tensor.tolist() now dispatches through Variable.tolist() so that we only
have one code path to test until we merge Variable and Tensor.
2017-12-06 12:35:05 -05:00
090a23251e Add Variable._cdata (#4045)
This is to help with merging Variable and Tensor. It's equivalent to
Tensor._cdata and ATen's unsafeGetTH().
2017-12-06 12:26:01 -05:00
f92c5aa7ce slightly simplified indexing (#4040) 2017-12-06 00:23:57 -08:00
6154670e0d Fix test_while case with in-place add op + broadcast
Summary: r = Add(r, r, broadcast=1i) is apparently illegal in caffe2

Reviewed By: zdevito

Differential Revision: D6495190

fbshipit-source-id: 8caddef6d9dbcb0f6f6ff18b39aec5251ab1d1e5
2017-12-05 22:49:00 -08:00
188e709885 Beam search example
Summary: Closes https://github.com/caffe2/caffe2/pull/1578

Reviewed By: zdevito

Differential Revision: D6492503

Pulled By: jamesr66a

fbshipit-source-id: a8cd5901a1c799656882706213f3a1b2a6cfe652
2017-12-05 19:53:19 -08:00
fef095af9c Set broadcast flag for binary operators
Summary:
lines such as

    output_scores = best_scores_per_hypo + scores_t_squeezed
    hypo_t_int64 = best_indices / 6LL

will emit the respective binary operator (e.g. `Add`, `Div`) with the `broadcast` flag set to 1
Closes https://github.com/caffe2/caffe2/pull/1577

Reviewed By: zdevito

Differential Revision: D6489991

Pulled By: jamesr66a

fbshipit-source-id: 3bef2bd43dfa18659a299cc62affd74f9a763491
2017-12-05 19:53:19 -08:00
a53522e560 Implement typed numeric literals
Summary:
1 is an int32
1LL is an int64
1f is a float

Still need:
Parsing out numbers such as 1.0 as integer. 1.0f should work, though
Closes https://github.com/caffe2/caffe2/pull/1576

Reviewed By: zdevito

Differential Revision: D6489944

Pulled By: jamesr66a

fbshipit-source-id: 46aab9483a18a31d883c8c7e3086d3074fa5efac
2017-12-05 19:53:18 -08:00
8610ea5e2a ElementwiseLinear fallback
Summary: TSIA

Reviewed By: Yangqing

Differential Revision: D6494589

fbshipit-source-id: 20dafbb4039b187edbf500ccb71e5abfdf9fa173
2017-12-05 19:32:18 -08:00
7ebd589801 Added supports to prof_dag_net and GetProfDagStats operator to collect not only per-op-type cost but also per-op cost.
Summary:
Previously, GetProfDagStats operator collects per-op-type cost of a given prof_dag net.
With this diff, the operator GetProfDagStats has a new option “per_op”, when it is false (default value) , the operator still calculates per-op-type cost.
Otherwise, it returns per_op cost, the cost of multiple instances of the same op type will be calculated separately

Reviewed By: heslami

Differential Revision: D6478547

fbshipit-source-id: 82f00f5fb262cd60b81d2accdd8e3598ddf2eefe
2017-12-05 18:32:43 -08:00
0a2c5d1ad7 CUDA implementation of UnpackSegmentsOp
Summary: Replace the fallback implementation by native CUDA code. Minor edits of PackSegmentsOp: let all computation use one buffer tensor.

Reviewed By: asaadaldien

Differential Revision: D6455236

fbshipit-source-id: 71f146c470009d1cecf3f2e2f5c381b1751c061c
2017-12-05 17:47:56 -08:00
79ac146808 Add if and while ops to brew
Summary:
Adding if and while control ops to brew, also adding unit tests
Note: unlike net_builder where we can figure which blobs are external and which ones are local to subnets, here in brew we need to use external_blobs param explicitly to point at external blobls

Reviewed By: harouwu

Differential Revision: D6440508

fbshipit-source-id: c920f0af84b77ccb2d8462ffc7567bb1908c844a
2017-12-05 17:33:34 -08:00
e70b117583 Set of bugfixes for script compiler
Summary:
* Fix typo in negative constant handling "Negate" -> "Negative"
* Fix unpacking constant in parsing elements for a list attribute
* Parse negative signs in constants
* Switch list syntax to use square brackets in attributes
Closes https://github.com/caffe2/caffe2/pull/1572

Reviewed By: zdevito

Differential Revision: D6483286

Pulled By: jamesr66a

fbshipit-source-id: 949e8fd6a96b12efde756bac9da987da0010e153
2017-12-05 16:49:48 -08:00
d1d6c0b12b Add CUDA implementation for ReplaceNaNOp
Reviewed By: jay-mahadeokar

Differential Revision: D6481993

fbshipit-source-id: cb253621795bb9de73d3e8bc1c8fc21b596d88c3
2017-12-05 13:34:51 -08:00
ea7652e011 Add debug and fix some bugs in CPU fuser
* avoid writing `x + 1.0000*y` which causes a promotion to double from float
* refactor tests to make writing graphs easier (while not strictly necessary,
  I have some benchmarking code that I am using to make the fuser faster
  that is easier to write in this form)
* option to dump the disassembly of the CPU fused code for perf debugging.
2017-12-05 13:20:31 -08:00
0b7f1e5efd Fix segfault during ONNX export 2017-12-05 14:31:39 -05:00
5241cdf546 Implement Variable.numpy() (#4006)
Implement Variable.numpy() and dispatch Tensor.numpy() through Variable.numpy()

Variable.numpy() is disallowed on variables that require grad.
2017-12-05 14:24:11 -05:00
71b1858de7 Implement Variable.storage_type() (#4036) 2017-12-05 14:11:08 -05:00
5c13c6962c Raise errors when num_workers == 0 in DataLoader (#4019) 2017-12-05 11:07:43 -08:00
046c11cd73 Stod
Summary:
This is in order for Android to pass - Android support for string related functions is quite limited.
Closes https://github.com/caffe2/caffe2/pull/1571

Reviewed By: pietern

Differential Revision: D6486079

Pulled By: Yangqing

fbshipit-source-id: f0961e2dde6202bd6506f4fb8a3aea4af1670cb5
2017-12-05 10:48:09 -08:00
a8250280bb Py3 test fixes
Summary:
\cc pietern
Closes https://github.com/caffe2/caffe2/pull/1555

Differential Revision: D6479902

Pulled By: pietern

fbshipit-source-id: 84647eddec45620b1ed603f4882ded2dd49adc43
2017-12-05 10:34:41 -08:00
ea56e0d424 Implement BatchMatMul with Numpy-style batch broadcast semantics
Summary:
ONNX has decided to implement a single MatMul operator that borrows semantics from np.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

This PR introduces a new op that we can target for ONNX that mimics the numpy-style broadcast semantics
Closes https://github.com/caffe2/caffe2/pull/1507

Reviewed By: dzhulgakov

Differential Revision: D6389022

Pulled By: jamesr66a

fbshipit-source-id: a2270ad0042b1ddf6c65ba7cb10d83e0763cf950
2017-12-05 10:34:35 -08:00
7842b4b878 Use warp shuffles in cuda varInnermostDim (#3846)
Use warp shuffles in cuda varInnermostDim and remove unnecessary __syncthreads()
2017-12-05 12:35:49 -05:00
f01052ade4 Use enabled in torch.autograd.profiler.emit_nvtx (#4032)
Or else it's always enabled.
2017-12-05 08:45:23 -08:00
94a0c72089 Delete _write_metadata and move _new_with_metadata_file into Python (#4020)
This will make it easier to merge Variable and Tensor
2017-12-05 11:24:54 -05:00
535a13dbc2 Move renorm to C++ and expose cumsum (#4013)
Also allow cumprod forward in C++
2017-12-05 11:24:03 -05:00
84d8e81311 Fix the symbolic for view 2017-12-05 02:31:49 -05:00
fcc142386b Make pool export compliant with onnx spec 2017-12-05 02:27:02 -05:00
0d68ce9383 Use integer division to fix failing test 2017-12-04 21:16:09 -08:00
8da31c240d Revert changes in blob name in optimizer
Summary: A while ago, we had to change some blob names in `optimizer.py` (more specifically, names of `iteration_mutex` and `optimizer_iteration`) to handle corner cases when preparing a net for parallel execution.

Reviewed By: azzolini

Differential Revision: D6480819

fbshipit-source-id: a03a7aa9fad322a50e7785914b0eb0f8654e6d90
2017-12-04 19:32:45 -08:00
7e1fccb8f5 Add is_pinned, is_shared, and share_memory_ to Variable (#4015)
These are copied directly from Tensor. We'll need them before we can
merge Tensor and Variable.
2017-12-04 20:47:10 -05:00
5ec224496b Merge common part in CUDA & CPU implementations of AddPaddingOp
Summary: The RunWithType() function of CUDA version shares a lot of code with the CPU version of the op. Merge them by pulling out the different parts of RunWithType() and putting them into a separate CPU/CUDA functions.

Reviewed By: asaadaldien

Differential Revision: D6467962

fbshipit-source-id: 83b45e697a094e959f66e898f46f06b0e2c329bc
2017-12-04 16:55:49 -08:00
f83ca6338a Shorten stack trace in build for C++ errors. 2017-12-04 18:27:34 -05:00
7bf6aaacd3 add missing CMAKE_GENERATOR 2017-12-04 18:27:34 -05:00
d76f7a806d Fix potrf gradient and enable gradchecks (#3861) 2017-12-04 16:50:41 -05:00
32500fe800 Reducing array sizes used in pack_ops_test to prevent time outs during Travis CI builds
Summary:
Reduced the array sizes used in pack_ops_test to prevent time outs
during Travis CI builds.

Reviewed By: enosair

Differential Revision: D6476703

fbshipit-source-id: 20ab871ae40349ca27186447a84135bbc5c351b1
2017-12-04 12:48:53 -08:00
c8cc04bd85 Builder scripts for Docker containers
Summary:
This includes a build script for Docker containers to run builds and tests in as well as a build and test script that is run to build and test Caffe2 itself. These scripts are directly used by Jenkins.
Closes https://github.com/caffe2/caffe2/pull/1552

Reviewed By: pjh5

Differential Revision: D6476377

Pulled By: pietern

fbshipit-source-id: c9268873c03d0878bea0e8516a72c27813284427
2017-12-04 12:04:22 -08:00
9e46fca424 Use ninja as the cmake backend as well. 2017-12-04 14:16:26 -05:00
739fa34ccd Change ATen's gen.py script so that it can list all of its outputs
before reading input/files and doing string formating.
2017-12-04 14:16:26 -05:00
4b6c8779eb Fixes the the NativeFunctionsCuda.cu intermittent build issues.
CMake does not correctly add generated header file dependencies
for CUDA compilation units (cpp works fine.). This introduces an
explicit dependency to force the aten generator to run first.
2017-12-04 14:16:26 -05:00
61a582da44 Fuser now locates g++. 2017-12-04 14:13:44 -05:00
0cdc1f2f1f Make TempFiles lifetimes shorter.
Relax the amount of file syncing for cpp file.
2017-12-04 14:13:44 -05:00
f72fe0624d Add a CPU Fuser (single core)
This adds a simple fusion backend for the CPU.
* Refactors CompiledFusionFunction to have two subclasses that handle
  the compilation details of each backend.
* emit-compile-link-run cycle for the CPU
* simple single core loop to run the operation
* lift CUDA-only restrictions in the fuser, checks that fusion groups
  are only on a single backend.
2017-12-04 14:13:44 -05:00
bcfe259f83 Add streams and comms as optional arguments (#3968)
Adds streams and comms as optional arguments to the NCCL calls in
torch.cuda.nccl. Also exposes ncclUniqueId and ncclCommInitRank for
multi-process mode.

Moves Py_RETURN_NONE statements after the GIL is re-acquired.
2017-12-04 13:51:22 -05:00
540a9c279e Add LayerNormLSTM
Summary:
Adds a new `LSTMCell` subclass to the `rnn_cell` module that performs layer normalization on the fused input matrix. Moves around some code in `rnn_cell.py` to avoid copy-pasta. Adds relevant test cases to `rnn_cell_test.py`.

Had to fix `brew.layer_norm` first. See T24013870.

Reviewed By: jhcross

Differential Revision: D6454883

fbshipit-source-id: 0f4ea7a778cc5be6a7274f7b28c793f5dd7c6095
2017-12-04 10:48:37 -08:00
5571d0187e Accept longs in default_collate for dataloader in python 2 (#4001) 2017-12-04 09:50:57 -08:00
a9606580ef Remove separate nccl installation from Dockerfile9 (#4003)
Base image already contains nccl
2017-12-04 09:50:06 -08:00
4eb8e12765 Introduce scopes during tracing (#3016) 2017-12-04 09:19:06 -08:00
e1e08d631a Always check cuDNN support in test_convolution_gradients
Summary:
Regardless of device checker/gradient checker we cannot run a
backwards pass with cuDNN when NHWC is used.
Closes https://github.com/caffe2/caffe2/pull/1566

Differential Revision: D6474181

Pulled By: pietern

fbshipit-source-id: 727d7b4f2a1431a4d6675ffb76c5b60d3d7fa712
2017-12-04 08:50:39 -08:00
7ddcb91c7f Add more ONNX symbolics 2017-12-04 07:15:35 -05:00
41897e3e78 Supress hypothesis health check in glu_op_test.py
Summary: Closes https://github.com/caffe2/caffe2/pull/1564

Differential Revision: D6472568

Pulled By: pietern

fbshipit-source-id: 4f1bd3a1ced6d77991531eb864d2cf5d39bc7c4f
2017-12-03 22:51:46 -08:00
cdd48a8575 Fix typo in clang ifdef to fix clang 3.9 build
Summary:
This prevented building with clang 3.9.
Closes https://github.com/caffe2/caffe2/pull/1565

Differential Revision: D6472567

Pulled By: pietern

fbshipit-source-id: 361c3f9e85237ca0328e12eb23309bc4a3e11556
2017-12-03 22:51:45 -08:00
07904eaed9 Moving tensorboard to OSS
Summary: Moving tensorboard from fb specific and untying all dependencies on fb code

Reviewed By: dzhulgakov

Differential Revision: D6313818

fbshipit-source-id: 19302c372540400fa60d34015ef9e944ab203d2e
2017-12-03 19:18:01 -08:00
1351152362 Skip DeviceShiftTest if host has < 4 GPU devices
Summary: Closes https://github.com/caffe2/caffe2/pull/1563

Differential Revision: D6471667

Pulled By: pietern

fbshipit-source-id: 99efd21b98c00eb0a846ca8b395bdfd550fe02f1
2017-12-03 16:02:05 -08:00
76d7bace47 Add opencl logging part I
Reviewed By: Maratyszcza

Differential Revision: D6441192

fbshipit-source-id: 453580e6bf5abceb00667e1045e316ffe30764cb
2017-12-03 13:16:57 -08:00
f2be3a4e5e Allow specifying device to prepare_prediction_net()
Summary:
This is a supplementary to commit ce8267d425444f60ae650389fb41838847a44a5e. It allows specifying device to prepare_prediction_net() so prediction extractor can work with GPU.
Closes https://github.com/caffe2/caffe2/pull/1035

Differential Revision: D6467420

Pulled By: salexspb

fbshipit-source-id: b5b9a1536fb516e90b5e4b615403086943cfbe93
2017-12-03 10:32:08 -08:00
710f6d6958 Fix warnings and add alert to enable ninja when developing. 2017-12-03 04:49:41 +01:00
67f6b5b565 Handle broadcast of ints on CPU side
Summary: Closes https://github.com/caffe2/caffe2/pull/1537

Reviewed By: harouwu

Differential Revision: D6456371

Pulled By: pietern

fbshipit-source-id: 8bf05c2d9e1f5adda5efb29ccaedb220932397f3
2017-12-02 19:33:03 -08:00
ca7951b93d remove unused variable
Summary: Oops, I left an unused variable here. Let's get rid of that!

Reviewed By: enosair

Differential Revision: D6468223

fbshipit-source-id: 27cc0900b330f056c5b5585a136fb46f5830cf81
2017-12-02 01:31:42 -08:00
2c190d2f05 update transformer code for layer_norm() API change
Summary: Quick fix for unit test broken by D6454290. This is my fault for approving while the tests covering the single callsite were broken.

Reviewed By: goldsborough

Differential Revision: D6466566

fbshipit-source-id: 2683be3d6bb184286e64fbde3e572946e39030c7
2017-12-01 20:19:31 -08:00
e5906db3e9 trtrs backward (#3972) 2017-12-01 22:17:50 -05:00
232f8c73dd fix flake 2017-12-01 22:16:07 -05:00
96cd3743f1 Make workspace id type consistent with net-rewriting pipeline
Summary: There are two components that deal with workspace ids: 1) comm framework, 2) injection of GLOBAL_WORKSPACE_ID. The type of workspace id should be consistent for these components. 32 bits integers should be sufficient for such ids.

Reviewed By: akyrola

Differential Revision: D6443675

fbshipit-source-id: 7b0e8a3b005683350706fa5c330abf0a9d4881dd
2017-12-01 18:47:12 -08:00
0512597f86 Switching to MPSCNNConvolutionTranspose for iOS11 and above
Summary: att.

Reviewed By: ajtulloch

Differential Revision: D6420049

fbshipit-source-id: 30262dfefe8c400285bcaaab50de3a5d3ff68858
2017-12-01 17:49:09 -08:00
638b10d39b fix softmax default dim for 1D Tensor 2017-12-01 19:20:04 -05:00
165d0897e4 Implement distributions.Gamma (#3841) 2017-12-02 01:10:08 +01:00
932e484029 fix doc change lint; (#3974) 2017-12-01 17:24:30 -05:00
b43c1b2bed Fix and upgrade brew.layer_norm
Summary:
While working on layer normalization for LSTMs I encountered an issue where the layer norm parameters (which are the scale/gain and bias/shift from the paper) were not registered in the model for `brew.layer_norm`. salexspb explained that this is because it was using the `init_net_param` API instead of `create_param`. This diff fixes this.

While fixing I noticed that I noticed that `brew.layer_norm` actually had a bug where it was multiplying with the bias instead of adding it. Another issue was that the function giving the scale and bias a shape of `[1]`, however the paper (https://arxiv.org/pdf/1607.06450.pdf) specifies that, like for batch norm, there is one scale and bias parameter per neuron, i.e. the shape should be `[1, axis_dimension]`. The API now takes an explicit `dim_in` parameter (also more consistent with other normalization functions in that module) so that this can be specified. See tests for how this now looks.

Reviewed By: jhcross

Differential Revision: D6454290

fbshipit-source-id: fc00ca614de3190c40ab743e8984bec9e85fb58c
2017-12-01 14:18:28 -08:00
fe12ac57a4 Improve docs for torch and torch.Tensor (#3969)
* doc overhaul

* update split doc
2017-12-01 14:56:48 -05:00
b2865ef389 Fix comparison warnings in scalar_tensor_test. (#3964) 2017-12-01 14:56:34 -05:00
3af2b8f428 Adding length verification check to pack_segments
Summary:
Adding a check to pack_segments to make sure the lengths passed in add up as expected.

Additionally started to address https://fb.facebook.com/groups/1405155842844877/permalink/1977332432293879/ , but it might not fix that issue, but is still useful if it does not help that issue.

Reviewed By: salexspb

Differential Revision: D6443490

fbshipit-source-id: 680dc763a788a550d321d97a556c5b46e3402dd1
2017-12-01 10:47:25 -08:00
c681b03d37 Add determinant function on variable; Add backward on svd (#3816)
* determinant on variable

* svd bwd
2017-12-01 13:22:46 -05:00
3d1135c842 Skip remove_padding test because it is flaky
Summary:
Must be fixed in #1547
Closes https://github.com/caffe2/caffe2/pull/1548

Reviewed By: jhcross

Differential Revision: D6456373

Pulled By: pietern

fbshipit-source-id: 484a58e31506acfc8b8a0954f76796d14dfdfda3
2017-12-01 09:47:31 -08:00
80c8635a7e fix math notation (#3962) 2017-12-01 10:15:10 -05:00
34beafcd00 Add static_cast to Get*Argument to avoid compiler warning.
Summary: This suppresses warnings on Windows.

Reviewed By: dzhulgakov

Differential Revision: D6454709

fbshipit-source-id: f7ea437ae261eee584cac36264e4ab331d6eb3c8
2017-12-01 00:36:50 -08:00
f80902c6fa update Tensor.new doc 2017-11-30 23:14:19 -05:00
96c6652131 improve Tensor.scatter doc 2017-11-30 23:14:03 -05:00
754ae49f65 Documentation updates for ONNX.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-30 23:09:45 -05:00
de00aab720 PyTorch now uses operator versioning.
Also move some of the exporter info out of the ModelProto constructor.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-30 23:09:45 -05:00
1c0fbd27a1 CuDNN bindings rewrite (into ATen) (#3666)
* Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra

The executive summary is that this moves the torch/csrc/cudnn
library into ATen, adding a number of new cudnn_ methods to ATen
for batchnorm, convolution, affine grid generator and grid sampler.

ATen infra changes:

- TensorGeometry was moved to ATen
- TensorGeometry was modified to make its interface resemble that of
  Tensor; in particular, sizes is no longer a field, it's a method.
- AT_CUDA_ENABLED macro is set via ATen/Config.h header which is
  generated at cmake configure time.
  Fixes https://github.com/zdevito/ATen/issues/168
- Change AT_CUDA_ENABLED macro to be a function macro, so that we
  error if it is not defined
- Introduce a new TensorArg class, which is a Tensor plus a little
  metadata.  This helps us give good error messages when checking
  dimensions/shapes of tensors.
  Fixes https://github.com/zdevito/ATen/issues/169
- Also introduce a TensorGeometryArg class, for when you don't
  need the actual tensor data (which is most of the time.)
- Add ATen/Check.h, which contains a number of utility functions
  for testing shapes, types and devices of input tensors.  This
  will be particulary useful for native methods, which don't get
  code generated input testing code.  These functions take a
  'CheckedFrom' argument, at the moment just a string, which
  specifies some extra information about what function was
  doing the actual checking; this greatly improves error messages.
    - Many check functions take initializer lists, which let you
      test that all tensors have some property.  This API is
      peculiar, in that we IGNORE undefined tensors in this case.
      This is handled by filterDefined.
- Add AT_CUDNN_ENABLED macro
- CuDNN linking from ATen was improved; for example, we now actually
  add the CuDNN headers to our include path.
- Add some missing override specifiers to some methods
- We now actually build tests with CUDA functionality accessible
  (previously, AT_CUDA_ENABLED was not defined, meaning that
  the headers were missing all CUDA-only functionality.)
- Native functions now support giving explicit names to return
  outputs in yaml.  This makes it possible to hook into the NN
  autogenerated derivatives codepath using native functions.

CuDNN rewrite changes:

- torch/csrc/cudnn now uses ATen (rather than passing around
  THVoidTensor) and lives in ATen.  This lets us remove tensorPointer
  shenanigans.  The functions are exposed to ATen as native functions
  described in aten/src/ATen/cudnn/cuDNN.yaml
- ATen now builds and links against CuDNN when enabled.  The cmake
  package script was taken from Caffe2.
- Some header reorganization was done to help reduce dependencies
  on headers (this reorg is no longer used but I've kept it)
- Rename CHECK to CUDNN_CHECK
- Rip out old shape/type testing code in favor of modern ATen/Check.h
  interface using TensorArg.  In many cases, increase the robustness of
  the checking code.
- Change the inputs of the public facing functions, so that they can
  be bound by ATen
  - Delete THCState*; this is retrieved from the global ATen context
  - Delete cudnnHandle_t, this is retrieved from the global Handles.h
  - Delete cudnnDataType_t, this is retrieved from the Tensor type
  - Delete Convolution class, instead its constituent arguments are
    passed individually
- Change functions to return tensors, rather than take an appropriately
  sized output tensor as an input.
- Redo how transposed convolution / backward convolution is implemented
  (knock on effect of returning tensors).  Previously it was assumed
  that you would always pass an appropriately sized output tensor, but
  we don't want to do this anymore.  For backwards, we instead give
  the desired output tensor (input, really) size, because that is
  readily available.  For *transposed* convolution, however, we take
  output_padding, and otherwise do the shape calculation.
- Redo how legacy group convolution is implemented (knock on effect from
  porting cudnn to ATen.)  Previously, group convolution was implemented
  by manually constructing sizes and strides and then outputting
  appropriate, with macros switching between individual groups and
  all-at-once based on CuDNN version.  Now, the code looks exactly what
  you'd expect: there's a top-level wrapping function that supports
  group convolution no matter the version of CuDNN, and a low-level
  wrapper which supports only what CuDNN supports.  The top-level
  function conditions on CuDNN version, and invokes the low-level
  interface 1 or n times.
- There is now a debugging printer for tensor descriptors.
- Convolution struct is replaced with ConvolutionArgs, which is not
  part of the public API but is used internally to conveniently
  pass around all of the arguments needed for Convolution.
- Add some constexprs for well-known dimensions, reduce amount of
  magic numbers in code.
- Put 'deterministic' in to ConvParams.  Fixes #3659
- Lots more comments.
- Some pessimizations, in the name of code clarity:
  - The descriptors are initialized on every invocation of convolution
    forward/backward.  Previously, the descriptors were cached, so that
    you didn't have to initialize them again on backwards.  This is
    difficult to support in the ATen interface so I didn't support it.
  - Legacy group convolution initializes its workspace for *every* group
    it performs.  I did not feel motivated to fix this because the
    legacy codepath is already quite slow.
- Affine grid generator and grid sampler automatically call contiguous
  on their arguments as necessary.
- Batchnorm input checking is greatly beefed up, it now checks for
  the following input characteristics:
    - Definedness
    - GPU location
    - Type
    - Contiguity
    - Size

PyTorch binding code changes

- batchnorm now uses consistent var/data naming
- batchnorm and convolution make use of new ATen bindings
- Affine grid generator and grid sampler make use of ATen CuDNN
  bindings via derivatives.yaml.  This means I had to restructure
  the code a little, since the THNN bindings still go through
  a legacy Python class.
- I fixed some warnings:
  - s/friend class/friend struct/ on InterpreterStateImpl
  - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp
  - Removed unused pack_list on Scalar

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

GCC 4.8 buildfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Add TensorGeometry to ATen.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CUDNN_CHECK

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Update TODO comment

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Delete return in cudnn_grid_sampler

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Don't allocate a new vector when filtering defined.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Remove Check overloads, convert to pass references.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Some more microbenchmarking.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-30 23:06:58 -05:00
21731d7b53 register gradient for lambda rank
Summary: as titled

Reviewed By: xianjiec

Differential Revision: D6438164

fbshipit-source-id: 9df4da5ba4983d2952a586a8c70bbcf3094a17b4
2017-11-30 17:03:12 -08:00
a00d7a1bec ushort2(gid.x, gid.y) -> gid.xy
Summary: att

Reviewed By: ajtulloch

Differential Revision: D6442939

fbshipit-source-id: 57da10b7249769e8e03d5f505ed3b6ddd3314c98
2017-11-30 16:48:20 -08:00
cf07820849 Enable SparseLengthsMean
Differential Revision: D6445834

fbshipit-source-id: 5cbc95e6975b2447dc82dbe293d0ddd9adf6b5a3
2017-11-30 16:04:38 -08:00
0c588a500b Replace sigmoid + xent loss with SigmoidCrossEntropyWithLogits for better numerical stability
Summary: Replaced sigmoid + xent loss with SigmoidCrossEntropyWithLogits. The sigmoid layer computes the multinomial logistic loss of the sigmoid of its inputs. It's conceptually identical to a sigmoid layer followed by a multinomial logistic loss layer, but provides a more numerical stable gradient.

Reviewed By: xianjiec

Differential Revision: D6305455

fbshipit-source-id: 444c9f651fbdf13c3c52be5142769f8f98ed8770
2017-11-30 14:04:36 -08:00
ab0a7eb7bf Add ONNX symbolics for several ops (#3956) 2017-11-30 16:40:13 -05:00
fcb0b0de26 fix (#3953) 2017-11-30 14:42:06 -05:00
79f32b46c1 Fix contrib/script build
Summary:
Fixes a missing line in ff2c973547

zdevito jamesr66a
Closes https://github.com/caffe2/caffe2/pull/1540

Reviewed By: bddppq

Differential Revision: D6448498

Pulled By: dzhulgakov

fbshipit-source-id: 997453fc6182910140967506d6ad2c8366d06e32
2017-11-30 11:05:55 -08:00
8f2bc151d3 Find ninja path using the python module 2017-11-30 13:47:27 -05:00
70ca83793d Add support to emit compile_commands.json from CMake/ninja
files.
2017-11-30 13:47:27 -05:00
0e54c3a989 Significantly speed up the incremental build.
This commit adds code to setup.py to use ninja to manage
C++ and code generator dependencies rather than use raw setuptools.
This is based on similar code added to ONNX.

Enabled optionally when ninja is installed.

On my computer speed for a do-nothing build drops from 10s to 1.5 seconds.
Speed of other compilation steps is significantly improved as well.
Dependencies are tracked correctly so the need for ccache is reduced.
2017-11-30 13:47:27 -05:00
442ffac686 Update CONTRIBUTING.md (#3952) 2017-11-30 13:26:56 -05:00
8cb32ba630 rnn.py: Note zero defaults for hidden state/cell
* Add a note on zero defaults for hidden states/cells of
  RNNs/LSTMs/GRUs.

* Should fix the note in #434

Signed-off-by: mr.Shu <mr@shu.io>
2017-11-30 19:06:26 +01:00
fc3f88d8a4 higher order interaction of embeddings
Summary:
Get higher order interaction of embeddings, similar to cross net but applied in the embedding level.
Formula:
  e_(l+1,i) = element_wise_mul[e_(0,i), \sum_i(e_(l,i) * w_(l,i))] + e_(l,i) + b
where l means the l-th layer of this higher order net, i means the i-th embedding in the list.

Finally, concat all the embeddings in the last layer, or concat the sum of each embedding, and attach to the output blob of dot processor.

Differential Revision: D6244001

fbshipit-source-id: 96292914158347b79fc1299694d65605999b55e8
2017-11-30 08:51:09 -08:00
094df38e2f Fix dependency build when pwd contains spaces (#3950) 2017-11-30 10:31:19 -05:00
7e9724142a batched layer parameter loading for model initialization from an existing model
Summary:
Problem:
when we initialize a model from an existing model, currently we load information for each layer parameter independently (in utils.py), including shape information. we have to load the whole model from the db_path every time when we initialize one parameter (in layers.py). For example, in f31078253, the model needs to be initialized twice (not sure why). each time there are 152 layer parameters to load. and loading a model needs 10 min - 50 min depending on resource status.
Restriction:
1. _infer_shape_from_initializer in layers.py is called from multiple other places, besides the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py, which is the root cause of f31078253. So we still need to support the load operator in _infer_shape_from_initializer. So we need to batch shape blobs loading outside of LayerParameter.
2. in the if branch of ModelInitDefinition.PARAMS in load_parameters_from_model_init_options in utils.py, the db_path can be different from different parameters, so it is hard to batch them.
Solution:
Batch the shape blobs loading in the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py. We load the model and generate shape blobs of layer parameters in the workspace, so that _infer_shape_from_initializer in layers.py can directly return shape blobs of layer parameters cached in the workspace without reloading the model. and at the same time _infer_shape_from_initializer can still support separate any load operator if shape blobs are not pre-loaded into the workspace (this logic can be used for other ways to initialize a model rather than from an existing model).
Right now we are using 500 layer parameters per batch, and it worked fine. So for 152 layer parameters, one model loading is enough.

Reviewed By: xianjiec

Differential Revision: D6397607

fbshipit-source-id: 54f6f61d6d8b70c82b74c2d72ac56cd010a710da
2017-11-29 22:17:51 -08:00
7374c981d8 CUDA support for PackSegments Op
Summary: Replace GPUFallbackOp by native CUDA implementation

Reviewed By: akyrola

Differential Revision: D6423200

fbshipit-source-id: 47dfecbc486e9a8bf0cc6b897ab8b6a2488caa34
2017-11-29 22:01:42 -08:00
b766335753 Revert D6403523: [Part 2] Support regression with output transform in MTML for feed.
Summary:
This reverts commit faa0aab1227a27286b617e8e25adfbab3a349d2c

bypass-lint

Differential Revision: D6403523

fbshipit-source-id: eb43f348b09f2abcc52e101f43b0b9cc42a48ffb
2017-11-29 21:47:01 -08:00
2caca70a37 Allow shifting of activations / ops to other GPUs in data parallel model
Summary:
(Work in progress). This diff will allow shifting of activations to other GPUs, in case the model does not fit into memory. To see the API, check the code in data_parallel_model_test, which tests shifting two activations from 0 and 1 to gpu 4, and from gpu 2 and 3 to gpu 5.

I will need to further test on ResNets, and probablly add copy operations to handle device change points.

Reviewed By: asaadaldien

Differential Revision: D5591674

fbshipit-source-id: eb12d23651a56d64fa4db91090c6474218705270
2017-11-29 21:17:00 -08:00
4c7219b3b0 Implement matmul as a native function; use it for Variable impl (#3943)
* Implement matmul as a native function; use it for Variable impl.

This also includes an (inefficient) version of allclose, which was necessary for testing.
A more efficient version would use some apply logic to fuse the ops and exit early (coming in future PR).

On small tensors [(2, 5, 5) @ (5,5)], this yields ~2.5x speedup over the python implementation.

* Make maybeSqueeze static.
2017-11-29 23:13:04 -05:00
0e21cd2eae CUDA implementation of RemovePadding operator
Summary:
This is a CUDA implementation of the RemovePadding operator, modeled on akyrola's implementation for AddPadding.

There's also an incidental spelling correction: GetAddPadingGradient -> GetAddPaddingGradient.

Reviewed By: akyrola

Differential Revision: D6439594

fbshipit-source-id: b29cd0c252021c58e150b901bbaad28a3bd3cc4a
2017-11-29 18:48:01 -08:00
6e9bb93a71 Handle MPSCNNConcat edge case
Summary:
Handle cases when channels in an output image is filled by multiple input images.
e.g.
Input1: 1 channel, I2: 1 channel, Output: 2 channels

Reviewed By: ajtulloch

Differential Revision: D6432909

fbshipit-source-id: b7a8e9be51010e6aef0c50d93f9a7ec5558c74a4
2017-11-29 17:03:15 -08:00
6811acbef9 Syntax for control flow in C2
Summary: Experimental code that allows you to write C2 NetDefs directly using python-like syntax. This includes the ability to write native control-flow (if, while) and have it turn into IfOp and WhileOp

Reviewed By: jamesr66a, dzhulgakov

Differential Revision: D6123298

fbshipit-source-id: 25fc078b5769be61ac7fb3aa9a7c95bd88dccc30
2017-11-29 16:47:45 -08:00
c9e181f50f Support regression with output transform in MTML for feed.
Summary: Support regression with output transform in MTML for feed.

Differential Revision: D6403523

fbshipit-source-id: faa0aab1227a27286b617e8e25adfbab3a349d2c
2017-11-29 15:47:19 -08:00
1661370ac5 Signal handling in DataLoader workers; Timeout option (#3474) 2017-11-29 23:52:14 +01:00
3c709f5b26 Add HANDLE_TH_ERRORS for THPFunction_saved_variables and THPFunction_saved_tensors
SavedVariable.unpack() may throw std::runtime_error which may lead to
program termination with SIGABRT without the exception beeing handled
in Python

Fixes #3860
2017-11-29 22:54:27 +01:00
913a9a736c Backed out changeset 4e1241fe65cd (revert a revert :) )
Summary:
This fixes the issue but I haven't figured out yet why is it
happening.

Reviewed By: bwasti

Differential Revision: D6437378

fbshipit-source-id: bf983c9b6f57647423423ec6b22e0f9d2b170e74
2017-11-29 13:33:15 -08:00
926ed2b280 Implemented NCCL Distributed Backend for PyTorch with new dist APIs (#3435)
* Implemented NCCL Distributed Backend for PyTorch with new dist APIs

* Let FindNCCL to determine the NCCL version

* Let NCCL2 Backend use ATEN instead deprecated THPP

* Let distributed parallel model use a single reduction thread for NCCL backend

* Caching the sockets, bug fix, refactoring, and addressed Adam's comments

* Make BcastNcclID take a single param and bug fix for all_gather

* Removed barrier function, added warning for users, and not exposing experimental func to users

* Use the simplest single bucket working solution for distriubted data parallel model with rebase

* Cleanup, fixes and further addressed Adam's comments

* Used PySequence_Fast in distributed csrc

* Removed the limitation that each group is only bound to a given device sequence

* Used THPObjectPtr for PySequence_Fast
2017-11-29 15:57:02 -05:00
60d86bac91 Fix the UpsamplingNearest2d's symbolic (#3450) 2017-11-29 15:22:01 -05:00
47fadc3138 improvements to extend in ModuleList and ParameterList (#3505) 2017-11-29 20:46:39 +01:00
6f218cef25 Supress hypothesis health check in adagrad_test.py
Summary:
With some test seeds this warning starts firing.

Should be addressed in a better way, not generating as many invalid examples.
Closes https://github.com/caffe2/caffe2/pull/1536

Reviewed By: bddppq

Differential Revision: D6437138

Pulled By: pietern

fbshipit-source-id: c619d928a585e3d887f686db5d98f841af10c56b
2017-11-29 11:35:04 -08:00
eba0af4d5d Enable sampling ratio = 0 in RoIWarp
Summary: The case when sampling_ratio = 0 was skipped before, this diff enables that setting.

Reviewed By: ajtulloch

Differential Revision: D6366669

fbshipit-source-id: 4f3b9eaf47eb9dc20823935428d3d886ea32a5fc
2017-11-29 11:04:41 -08:00
929a11f920 Add interpreter support for Handles/PythonOp/CppOp (#3866)
* Add interpreter support for Handles/PythonOp/CppOp

This treats Handles as a first-class type in the interpreter
since this turned out to be conceptually simpler than treating
them as a separate concept, which requires a second channel for
register allocating and moving data from one op to the next.

Notes:
* The refcounting nature of tensors is factored into its own base type
so that it can be shared with other refcounted types such as handle.
* Some methods redundant with TensorBase have been deleted from Tensor
* The interpreter uses raw refcounted handles. In addition to being
able to treat Tensors and Handles as the same base object, it removes
a lot of redundant refcounting as objects moved from tensors to input/
output lists.
* aten_dispatch has been updated to work directly on the raw refcounted
lists to avoid refcounting and duplicate lists.
* Removing jit_closure.cpp, The interpreter can now handle all pathways.

* Functions like `unsafeToTensorShare` describe how
ownership transfers in the interpreter. The `Steal` variants
take rvalue references as arguments, and invalidate those
arguments to prevent potential problems.
* Make TensorTemporary is not a  subtype relationship because it is too easy to
do something horribly unsafe:

```
  void foo(at::Tensor bar) {
    // bar destructor call release on a temporary!
  }

  foo(TensorTemporary(retainable)); // structure slicing!
```
2017-11-29 11:38:57 -05:00
03829e55b3 Remove const modifier where it has no effect
Summary:
Remove `const` modifier on value-type return types, since it has no effect.
This fixes a clang 5 warning.

Reviewed By: Maratyszcza

Differential Revision: D6399474

fbshipit-source-id: b40af161be5ae67a944518f9b4043c194511267d
2017-11-29 08:32:06 -08:00
e105fee57d Fix ThreadPool class/struct forward declaration mixup
Summary: `ThreadPool` is a class, but it is forward-declared as a struct, which produces an error when compiled with clang 5.

Reviewed By: Maratyszcza

Differential Revision: D6399594

fbshipit-source-id: e8e81006f484b38e60389c659e9500ec9cfab731
2017-11-29 07:21:03 -08:00
c1babfa8e9 Fix ambiguous brace initialization of std::array
Summary: Double braces are required in C++11 when constructing an `std::array<,>` using aggregate initialization.

Reviewed By: Maratyszcza

Differential Revision: D6399752

fbshipit-source-id: 7b12c7a8193ba4904bb71b764a344bfd06ad7a7a
2017-11-29 07:21:02 -08:00
6ae0d477ea Fix cuBLAS arguments for fp16 dot (#3660)
* Fix cuBLAS arguments for fp16 dot

* Enable FloatTensor <-> CUDA HalfTensor checks in test_cuda.py
2017-11-29 07:16:34 -08:00
0ba9e5a636 Remove unused lambda capture
Summary: Remove unused lambda capture parameter which produces a warning in clang 5.

Reviewed By: Maratyszcza

Differential Revision: D6399643

fbshipit-source-id: cb49dc89749bd1d0143148ed559aa397f4d8f592
2017-11-29 07:03:03 -08:00
4e9fe7f168 Fix wrong arg in operator function for MSVC (#3934) 2017-11-29 15:11:22 +01:00
ea28deee75 use torch.cat in _flatten 2017-11-29 10:54:57 +01:00
4beb3ac3ab Properly guard cudnn backward path - NHWC is still not supported.
Summary:
TSIA. This is found in

https://github.com/caffe2/caffe2/pull/1530

Reviewed By: dzhulgakov

Differential Revision: D6434417

fbshipit-source-id: 2285c2f6252eb7f24e83357eb4887851b3adf690
2017-11-28 23:03:02 -08:00
f639625807 Epoch duration may be based on batches_per_epoch or duration in minutes
Summary:
Updating the reader Limiter to identify an epoch end either based on
batches_per_epoch or epoch_duration_len.
I am basically addressing the review comment of D6299602 where I was asked to
break that diff into 2 smaller diffs.
This is Part 1 of the diff D6299602 i.e. making the multi-reader capable of identifying
epoch end either based on batches_per_epoch or based on epoch_duration_minutes

Reviewed By: azzolini

Differential Revision: D6379955

fbshipit-source-id: b8f8e396f515c898ad2f9ee900ec8fad055306b0
2017-11-28 18:50:33 -08:00
38f166c13a Async executor with less polling
Summary:
Async executor based on async_polling (D5985110):
- Tasks scheduling other tasks, using polling only when necessary (e.g.
  CUDA->CPU case)
- Fully async, i.e. RunAsync immediately returns

Reviewed By: azzolini

Differential Revision: D6281681

fbshipit-source-id: 06e3723e1424ffab652c38ca7b279cf76e43fa44
2017-11-28 18:50:32 -08:00
0a434ff685 Remove Function::is_executable (#3907)
* Remove Function::is_executable

Ensure that grad_fn is null if requires_grad is false.

* Assert that grad_fn implies requires_grad=True
2017-11-28 18:29:27 -08:00
67c3cbd5e2 Optimizer: optimize transposes in variety of circumstances (#3509)
* Optimizer: Optimize transposes in variety of circumstances

- No-op transposes
- Consecutive transposes (fuse them)
- Transposes into Gemm (fuse them into transA/transB parameter)

* touch up out of date comment
2017-11-28 14:41:41 -05:00
d4e5d9061d Fix indexing with all zero ByteTensors (#3926)
Fixes #3914
2017-11-28 14:32:35 -05:00
14cc15e8f4 fixed NCCL bug in data_parallel_model.py
Summary:
Changed the dict of viewvalues into a python list

See issue: https://github.com/caffe2/caffe2/issues/1516
Closes https://github.com/caffe2/caffe2/pull/1532

Differential Revision: D6425901

Pulled By: akyrola

fbshipit-source-id: 37988abe29726aea86637e18eedb948b7c281008
2017-11-28 10:50:02 -08:00
157f949cef Implement python scalar conversions via ATen; allow localScalar if numel == 1 (#3908)
* Have localScalar work with all 1 element tensors, not just scalars.

Also have toCFloat, etc. call localScalar so 1 element tensors work as well.

* Implement python number conversions.

* Implement __bool__, __nonzero__ as ATen functions.

* Remove merge artifacts.

* Simplify by dispatching to toCDouble.
2017-11-28 12:56:51 -05:00
af9fd35d82 Cast tensors when loading optimizer state dicts (#3658) 2017-11-28 09:56:39 -05:00
51ca3a1a48 Make sparse test also check that coalesce status of tensors makes sense. (#3171)
This adds more heavy sanity checking when we run to_dense(); in particular,
we make sure that if it claims to be coalesced, it truly is coalesced, and if
it is not, that the coalesced version also to_dense() to the same thing.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-28 09:55:56 -05:00
c25a1493cd CUDA mode profiler fixes (#3754)
* CUDA mode profiler fixes

* Enable multi-gpu CUDA tracing

We need to record per-device start events because event timing
comparison only works for events on the same device.

* Course-grained CPU-CUDA syncing of timelines
  Record a __cuda_start event used to synchronize cuda/gpu timings.
  This requires running some warm-up event records to ensure the
  call to event record for the __cuda_start event doesn't take
  longer than normal.

fix syncing

* fix cuda build and lint
2017-11-28 09:32:34 -05:00
47918cab01 clean up some dead build logic (#3633) 2017-11-28 09:30:59 -05:00
31af836412 Improve fuser algorithm 2017-11-28 09:52:49 +01:00
e66c592d10 Handle ops with multiple inputs in aten_dispatch.cpp 2017-11-28 09:52:49 +01:00
00fe1f7cc8 Record trace before saving outputs 2017-11-28 09:52:49 +01:00
329891b44f Add a warning when JIT falls back to AutogradClosure 2017-11-28 09:52:49 +01:00
4726d0cb7a Fix exporting HalfTensor 2017-11-28 08:57:17 +01:00
709fcfda8a Now actually fix padding (the tests are added in onnx-pytorch) (#3893)
* Now actually fix padding (the tests are added in onnx-pytorch)

* fix test
2017-11-27 23:39:48 -05:00
453eb258de add code comments to RNNExecutor + lint + rename some
Summary: RecurrentNetworkExecutor is quite complex and was lacking documentation and had some stray comments. Cleaned up and added documentation. Also did some renaming and reformatting.

Reviewed By: ilia-cher

Differential Revision: D6421087

fbshipit-source-id: c3a57f60042ae4425a59123af5f54acb19e860e7
2017-11-27 19:18:08 -08:00
dd1558dc8d Improve stream selection
Summary:
Check that next picked up stream is non-busy. Details:
https://fb.facebook.com/groups/101100140348621/permalink/377531329372166/

Reviewed By: azzolini

Differential Revision: D6381701

fbshipit-source-id: 58f81b8d7ed8179e524f4ee50578dddbb3e69e45
2017-11-27 17:29:08 -08:00
e91b75615e Use ATen version of Variable type_as. (#3840)
* Use ATen version of Variable type_as.

* type_as can't handle Tensors (non-Variables) in the parsing code, handle this in python.
2017-11-27 19:10:33 -05:00
a08909160e fix bug in CUDA AddPadding when lenghts output is not provided
Summary:
enosair caught bug that the operator returned too early if the lengths output was not provided. Fixed and added testing.

+ noticed the op does not support case when no lengths-input is provided. Added a temporary CAFFE_THROW for this case, will fix later

Reviewed By: enosair

Differential Revision: D6405585

fbshipit-source-id: a81717e1b39afde6e900ddd9049b820943aea9f1
2017-11-27 15:14:07 -08:00
96e15179db Fix function signature in ATenOp for at::Half Set function (GPU version)
Summary:
The CPU version is fixed here. https://github.com/caffe2/caffe2/pull/1464/files

Now fix GPU version.
Closes https://github.com/caffe2/caffe2/pull/1506

Reviewed By: jamesr66a

Differential Revision: D6392830

Pulled By: houseroad

fbshipit-source-id: 922205630cbd8e24da80e269c6cd32bc4f040100
2017-11-27 14:18:26 -08:00
1ef54e3dab Fix OpenGL 3.0
Summary: Make OpenGL build

Reviewed By: bwasti

Differential Revision: D6415848

fbshipit-source-id: 0b78c90d8b0faf30c342ddbe5ccf91a9ac63ef8b
2017-11-27 11:48:56 -08:00
93e489d168 Fix _analytical_jacobian to not require in-place grad accumulation 2017-11-27 20:03:44 +01:00
73f6715f47 Do not link against libpython when building python bindings
Summary:
our cmake build used to link against libpython.so with its absolute path (instead of -LSOME_LIB_PATH -lpython), so at runtime loader will think it needs the libpython.so at that specific path, and so load in an additional libpython.so, which causes the python binding built with one python installation not reusable by another (maybe on same machine or sometimes even not on same machine). The solution is quite simple, which is we don't link against libpython, leave all the python related symbols unresolved at build time, they will be resolved at runtime when imported into python.
Closes https://github.com/caffe2/caffe2/pull/1514

Reviewed By: dzhulgakov

Differential Revision: D6412405

Pulled By: bddppq

fbshipit-source-id: 9ff5b752ae3806bfac94085942f82d89c304c887
2017-11-27 08:47:08 -08:00
6e2e623fa9 Use list for *requires in setup.py
Summary:
f012485e47
Closes https://github.com/caffe2/caffe2/pull/1523

Reviewed By: dzhulgakov

Differential Revision: D6414162

Pulled By: bddppq

fbshipit-source-id: 6a2941a4e996f04acddf6ad871b6b58cb763cc91
2017-11-27 08:35:28 -08:00
3da9d7971d Suppress pytest filter_too_much health check
Summary:
Fix the travis CI
Closes https://github.com/caffe2/caffe2/pull/1524

Reviewed By: dzhulgakov

Differential Revision: D6412499

Pulled By: bddppq

fbshipit-source-id: eaa5942c88d4edd65600d035e31d2300fd8ab3a8
2017-11-27 08:35:27 -08:00
9b31280ccf Have __sizeof__ account for size of stored elements (#3821)
* Have __sizeof__ account for size of stored elements

* Conform to sizeof specification
2017-11-27 11:22:34 -05:00
558516fcdb More docs for Conv1d Conv2d (#3870)
* Add a bit of notation explanation

For a first time user of Conv1d, it is not clear from documentation what N, C and L exactly mean. This should clarify this. Same for Conv2d.
2017-11-27 11:07:48 -05:00
11c9bd6c98 Allow target.requires_grad in l1_loss and mse_loss (#3876) 2017-11-27 10:59:16 -05:00
669a99b595 Remove as much of Python from JIT hot path as possible 2017-11-27 11:42:47 +01:00
06408168e6 Fix padding according to https://github.com/onnx/onnx/issues/261 2017-11-27 01:09:40 +01:00
3a5bbc2140 improve PackedSequence docs to explain batch_sizes (#3878) 2017-11-26 21:11:54 +01:00
989e8ff781 Implement is_sparse, is_distributed as native function, (#3838)
work towards cpu() working on sparse tensors.
2017-11-26 13:19:51 -05:00
fca77c9e25 Correct gradient of rosenbrock (#3881) 2017-11-26 11:49:04 -05:00
74fd79d889 Set seed at top-level of common.py (#3862)
Some tests, such as test_autograd.py, include random generation at the
top-level. It's going to be tough to police these files to ensure that
all randomness only happens within a test, so just set the seed as soon
as args are parsed (as well as before each test).

torch.manual_seed_all is no longer needed since torch.manual_seed also
seeds the CUDA random number generator.
2017-11-26 11:46:53 -05:00
ed640010ce Delete unused autograd functions (#3856) 2017-11-24 14:31:11 -05:00
9bbf4ee55e Fix lint (#3859) 2017-11-24 10:38:50 -05:00
65e0d5bad8 Fix void* wrapping in autograd codegen
Also, add assertions here and there to make sure bad things
never happen again.
2017-11-24 13:33:13 +01:00
ef70db09dd fix some more mathjax (#3352) 2017-11-24 11:14:37 +01:00
ffd39f4c9f added missing arg and improved example clarity (#3444) 2017-11-24 11:11:35 +01:00
754f3d3fe8 fixed a typo in ConcatDataset.cumulative_sizes attribute name 2017-11-24 11:07:51 +01:00
92883f3444 change doc for Adaptive Pooling 2017-11-24 11:06:13 +01:00
09b008f155 Fix BUCK for caffe2_test
Differential Revision: D6402763

fbshipit-source-id: c8fe2f84c1cac92eab9bb8f612278957cbfe042f
2017-11-22 20:56:38 -08:00
eb4344d6e6 Depthwise F(2x2, 3x3) convolution
Reviewed By: Maratyszcza

Differential Revision: D5117325

fbshipit-source-id: 21de84f8836bad142465eb02405a2f867fa09f85
2017-11-22 20:56:34 -08:00
8495503c6f Set a default input type
Summary:
Set a default input type so that users do not need to always specify one.

Test Plans: run caffe2_benchmark without the input_type argument, the default one is used.
Closes https://github.com/caffe2/caffe2/pull/1513

Reviewed By: hlu1

Differential Revision: D6401820

Pulled By: sf-wind

fbshipit-source-id: bc8406ca000b3f65fb9aeb1c9c80eb766d625758
2017-11-22 18:36:57 -08:00
0954775d28 AddPadding CUDA version
Summary: CUDA version of the AddPadding op. It first executes a prefix-sum using Cub to compute the cumulative lenghts array. Then it launches a kernel that uses this information to fill the output tensor with start, end paddding and the actual contents.

Reviewed By: asaadaldien

Differential Revision: D6391413

fbshipit-source-id: 45b431e5976674729e53cb4752c7753c1d8a69e8
2017-11-22 18:17:21 -08:00
5250d7fd11 simplify logic for weighted pooling using id score list
Summary:
so that user can use 'WeightedSum' pooling method when there is mix of id list feature and id score list features.

- it's still intuitive to have "WeightedSum" for id list, and we do not need to introduce new "UnWeightedSum" etc.

Reviewed By: chocjy

Differential Revision: D6369270

fbshipit-source-id: 722fa08d1a7986bc6ecf4c7cb02bbae0825bcab4
2017-11-22 17:32:04 -08:00
6dc1fc7e69 fix padding_idx for sparse=True (#3842) 2017-11-22 19:04:29 -05:00
af58bfbb1b Make integer parameters and buffers immune to float(), double() and half() (#3820)
* Avoid casting integer params and buffers to float(), double() and half()

* Add test for immune integer buffers

* Fix documentation for float(), double() and half()

* Fix test
2017-11-22 18:34:53 -05:00
48415d83c8 Fix instance_norm_test.test_instance_norm_model_helper
Reviewed By: jerryzh168

Differential Revision: D6391749

fbshipit-source-id: ba861d401e358290782db8f360c430e3f3daae96
2017-11-22 15:05:29 -08:00
b5ad8c8d16 Fix CharType min and max values (#3843)
* Fix CharType min and max

CharType is int8_t and this is not equal to char. CHAR_MIN and
CHAR_MAX cannot be used reliably to specify min and max values.

* Use SCHAR_* instead of hardcoded min/max values for CharType
2017-11-22 17:00:13 -05:00
59b2654544 reapply header change after xplat move
Summary: This is a reapplication of the earlier PR due to xplat move. Original author is Christoph Conrads <christoph.conrads@fluent.ai> christoph-conrads .

Reviewed By: houseroad

Differential Revision: D6379736

fbshipit-source-id: b7482ecf3b9487a528c15e92976e915791210002
2017-11-22 13:04:37 -08:00
3ac2a20c5f Fix DataParallel scattering for empty lists / dicts / tuples (#3769)
* Fix DataParallel scattering for empty lists and dicts

* Fix DataParallel scattering for empty tuples
2017-11-22 14:24:36 -05:00
9468a1e24f Add input type to caffe2_benchmark
Summary:
Allow inputs to be either float of uint8_t type.
Closes https://github.com/caffe2/caffe2/pull/1508

Reviewed By: hlu1

Differential Revision: D6392742

Pulled By: sf-wind

fbshipit-source-id: d83f1602b366907405108ce37fa35c1a0f68551a
2017-11-22 11:09:08 -08:00
b404cfe29a Fix MultiLabelMarginLoss docs (#3836) 2017-11-22 13:08:11 -05:00
9c498aa523 Implement Variable cpu() as an ATen method. (#3802) 2017-11-22 11:25:52 -05:00
c7b1d58b16 Fix THP_export for python_variable_indexing.cpp 2017-11-22 17:12:20 +01:00
fd7bfaf4e4 Fix errors in previous DataChannelMPI refactor (#3831) 2017-11-22 10:05:57 -05:00
fc0c8c2316 minor refactoring in dper
Summary: small changes as I was reading through the dper code base. all of them are nits, but somewhat helped me understanding things.

Reviewed By: xianjiec

Differential Revision: D6389380

fbshipit-source-id: 3412052e4fcba199c6ffc84c6f7ae11bf8ff6ee9
2017-11-21 18:12:49 -08:00
26c5e5d5d9 Use EIGEN3_INCLUDE_DIR for Eigen includes
Summary:
The plural version is not defined in the CentOS CMake module.
Verified EIGEN3_INCLUDE_DIR is defined in the Ubuntu CMake module.

This fixes the build on CentOS when using system Eigen3.
Closes https://github.com/caffe2/caffe2/pull/1505

Differential Revision: D6390712

Pulled By: pietern

fbshipit-source-id: b8abb14a62e0ff9fa9c920866504da0e75786c0d
2017-11-21 15:53:38 -08:00
b3e5166d4c Run build_android.sh in Jenkins
Summary: Closes https://github.com/caffe2/caffe2/pull/1479

Differential Revision: D6386248

Pulled By: pietern

fbshipit-source-id: ac4ce163c164a49aa83e2c7015003763bc2fd0e7
2017-11-21 15:53:38 -08:00
12ce6c8b7c Re-enable net_test
Summary:
Disabled when configuring Jenkins to get a run where tests pass.
Closes https://github.com/caffe2/caffe2/pull/1449

Differential Revision: D6390647

Pulled By: pietern

fbshipit-source-id: c16edc0c4d21ad60f101cf860e5dec183a1ea71a
2017-11-21 15:53:37 -08:00
5215640a41 Fix cosine_similarity's output shape (#3811) 2017-11-21 18:33:41 -05:00
cf3ca13321 Improve DataChannelMPI (#3817)
Remove unnecessary messages and make certain functions in-place.
This commit weakens error checking, but I think it's fine to make
it UB for now, and implement a better asynchronous mechanism later.
This is much needed for achieving high performance.

This also adds support for CUDA-aware MPI implementations.
2017-11-21 18:33:05 -05:00
c5e8048f58 add error checking for FusionCompiler around old CUDA versions (#3753)
* add error checking for FusionCompiler around old CUDA versions

* improve error message
2017-11-21 18:27:12 -05:00
304e64e70d Fix NetTest.ChainingForDifferentDevices
Summary: Closes https://github.com/caffe2/caffe2/pull/1495

Reviewed By: bwasti

Differential Revision: D6381246

Pulled By: pietern

fbshipit-source-id: bdc104dff0c667bde08fa0512b5956a39e84ad7e
2017-11-21 11:04:36 -08:00
daa450d656 add sanity check to model_helper.TensorProtosDBInput
Summary:
Caffe2 user was confused when model.TensorProtosDBINput([reader]) did not work. This is because of this outdated model helper function, that ignored the input blobs.
Added assertion to enforce correct usage. I did not want to make this work with reader input as well, since this probably should not be used anyway.

Reviewed By: amanrajdce

Differential Revision: D6380326

fbshipit-source-id: 6a50c2861f7f58c06cbfe3e86bde0f17a2b443cb
2017-11-21 10:28:25 -08:00
4518793aa2 Implement indexing in ATen (#3725)
Implements basic and advanced indexing using ATen tensors/variables.
Basic indexing is translated at the Python-binding level
(python_variable_indexing.cpp) to slice/squeeze/unsqueeze/select calls.
Advanced indexing is implemented in ATen in terms of take() and put()
calls.
2017-11-21 13:19:00 -05:00
dcaaf51100 Support /sqrt(n) pooling
Differential Revision: D6378584

fbshipit-source-id: 3c6606c4e71afbd31dbb97ceeac38dfbe7b40090
2017-11-21 09:04:02 -08:00
8ebf18b5b1 Added MAGMA_HOME env var to specify alternative MAGMA root directory (#3809)
FindMAGMA.cmake will look for MAGMA library under harcoded
/usr/local/magma by default. This commit adds MAGMA_HOME env variable
as alternative way to provide the MAGMA home directory. This is
very useful (and the only way) when the user is with restricted rights
and cannot install magma librairies under /usr/local/magma. Also it
is helpful when having multiple versions of the library and being able
to select the one to use.
2017-11-21 09:01:34 -05:00
303ed8af44 Allow specifying cmake build directory in the build scripts
Summary: Closes https://github.com/caffe2/caffe2/pull/1496

Reviewed By: pietern

Differential Revision: D6379743

Pulled By: bddppq

fbshipit-source-id: 1cb2238e5708547767729de3ac1d3e1a76ed5ba1
2017-11-20 20:32:30 -08:00
77b78935f2 More extensions
Reviewed By: kevinbchen

Differential Revision: D6300944

fbshipit-source-id: e915c3f3d6b475752d8b7df82ec467d86f88a7c7
2017-11-20 17:18:51 -08:00
a81c63df83 Fix pad handling in ConvPoolOpBase::SetOutputSize(...) in the legacy_pad case.
Reviewed By: Yangqing

Differential Revision: D6300926

fbshipit-source-id: 8126a02667f9313a8d148e3905384adf7470debf
2017-11-20 17:18:50 -08:00
e0c8c539e7 Backed out changeset 119623addbbd
Summary: Unlanding D6327460 because seems to be causing unstability.

Differential Revision: D6377117

fbshipit-source-id: 4e1241fe65cd4c7a127fa6fa724f60b75965a096
2017-11-20 16:17:52 -08:00
335c7dc681 Fix perfkernel compile error on clang 3.8
Summary:
Closes #1483.
Closes https://github.com/caffe2/caffe2/pull/1489

Reviewed By: bddppq

Differential Revision: D6376107

Pulled By: pietern

fbshipit-source-id: 892f74d67629609ed82c991cfd94508cf8e23c29
2017-11-20 16:17:51 -08:00
c52ca23447 Always define outputs of ConvBackwardBackward (#3799) 2017-11-20 19:05:25 -05:00
a9ef76b9c6 Reflect renaming of OS X to macOS (#3795) 2017-11-20 16:52:10 -05:00
ad3e619198 Bring back CUDA_ARCH_NAME=Manual
Summary:
This should also be ported to Gloo since its Cuda.cmake was
synchronized to Caffe2 in #1256.

Verified that running CMake with `-DCUDA_ARCH_NAME=Manual` and
`-DCUDA_ARCH_BIN=70` ends up running nvcc with `-gencode
arch=compute_70,code=sm_70`.

Closes #1460.
Closes https://github.com/caffe2/caffe2/pull/1487

Reviewed By: bwasti

Differential Revision: D6376222

Pulled By: pietern

fbshipit-source-id: 563a2947567a2af8a0e64475b346a19d76545ed3
2017-11-20 13:51:21 -08:00
4bce69be22 Implement Variable.storage() (#3765)
This still uses THPStorage, but avoids touching THPTensor
2017-11-20 14:18:07 -05:00
10d24d8f84 Add Tensor.slice() (#3750)
The slice function is very similar to narrow, except that it takes an
optional "step" argument. Unlike narrow, the arguments use the same
conventions as Python indexing: negative values wrap around and start
and stop are clamped to the size of the Tensor.
2017-11-20 13:58:12 -05:00
ee08120b46 Move Variable conversion methods to ATen. (#3762)
* Move Variable conversion methods to ATen.

* Add a test to ensure type conversions work through backwards.

* Fix VariableType copy for type conversions.

* Add comment about needing to handle device movement.

* Move back to opposite order for copy function params -- inplace views depend on it.

* Use is_available() rather than is_available.
2017-11-20 13:28:08 -05:00
0348e98cb4 docs update
Reviewed By: akyrola

Differential Revision: D6374075

fbshipit-source-id: bf312695f6f429e0ad1d1117bcbb85b0d1e06195
2017-11-20 10:07:22 -08:00
cf407213f9 Clean up stochastic function related dead code (#3782) 2017-11-20 12:44:45 -05:00
1ba3e14608 Throw Python exception from PythonOp instead of logging
Summary: Today when PythonOp throws an exception, we log the error and fail the op. Later we assert that the op/net/plan succeeds and throw with a generic message. The user must ttail the logs to find the real error. Instead, align with exception handling from other ops - throw directly. This will include full context of the exception in the error message.

Reviewed By: Yangqing, akyrola

Differential Revision: D6359684

fbshipit-source-id: 85133ba6562759607a3971449120647cbacce946
2017-11-20 09:03:17 -08:00
38cd6b3bd0 Fix run_test.sh mpiexec failures under virtual python envs (#3792)
If virtual python environment is in use (e.g. conda) and
mpiexec was compiled with --enable-mpirun-prefix-by-default option,
it will fail by default as the path is updated to the prefix and
different python (most cases /usr/bin/python) will be used.
2017-11-20 08:54:03 -05:00
4471e15b76 BMUF cpu support
Summary: change the interface so BMUF can run on cpus

Reviewed By: asaadaldien

Differential Revision: D6356026

fbshipit-source-id: f58a4da9f800d969145a1a376e118b0f3581f8c1
2017-11-19 23:41:25 -08:00
4d405a4430 Fix hash.h compile errors in newer compilers (#3783)
Disable the default std::hash<T> overload if T is an enum type.
2017-11-19 19:28:43 -05:00
3e4a777e44 Correct JIT interpreter autograd function (#3760) 2017-11-19 21:48:22 +01:00
fa5324d2a3 Update README.md (#3781)
* Update README.md

* Update README.md
2017-11-19 11:59:01 -05:00
2c39f3de99 flake8 fix 2017-11-19 00:32:02 +01:00
1f64c2ef91 Rename pyro.distributions.Multinomial -> .Categorical (#3766)
* Rename distributions.Multinomial -> distributions.Categorical

* Rename Multinomial -> Categorical

* Update docs

* Update variable.py

* Update distributions.py

* Update variable.py
2017-11-18 16:10:07 -05:00
40179cd61c fix cuDNN RNN weight tying test (#3774) 2017-11-18 14:42:40 -05:00
ca3fc59a9a fix elapsed_us spelling 2017-11-18 18:28:27 +01:00
0fd9682305 Fix torch::hash for MSVC again (#3767) 2017-11-17 22:46:14 -05:00
a9ec4ee742 Detect aliasing in cuDNN RNN flatten_parameters (#3752)
* Detect aliasing in cuDNN RNN flatten_parameters

* add test
2017-11-17 22:32:38 -05:00
0e99334efb move print to logger
Summary: further cleanup data_worker's messy output

Reviewed By: asaadaldien

Differential Revision: D6217857

fbshipit-source-id: 51cee29a687501d0f965422586fd6cb66a2d516a
2017-11-17 18:03:44 -08:00
0a09fba3f6 Capture CMAKE_ARGS in setup.py and pass them as args to build_local.sh
Summary:
build_local.sh has been changed in a8bb05d to not take CMAKE_ARGS environment variable as args to cmake command
Closes https://github.com/caffe2/caffe2/pull/1488

Differential Revision: D6364057

Pulled By: bddppq

fbshipit-source-id: a96787f3d3f1367ada4819420906e549f0945c8f
2017-11-17 15:51:30 -08:00
067f799e9f Implement remaining Variable fallthrough methods via ATen (#3744)
* Use aten version of is_signed.

* Define is_cuda native function and use it for variable.

* Use ATen dim for Variable dim/ndimension.

* Get rid of dim, ndimension fallthroughs in variable.py.

* Move size/stride Variable methods to use ATen.

* Implement shape property on Variable via ATen.

* Remove the _getattr__ function from Variable.

* Get rid of dispatch functions and avoid cast.

* Add THPUtils_packInt64Array.

* Throw python errors.

* Use fallthrough and fix fallthrough generation for native functions.

* is_cuda is a property, not a method.
2017-11-17 15:57:56 -05:00
5de880f3e1 Resume from epoch instead of re-starting a worklow from scratch when we retry
Reviewed By: anshulverma

Differential Revision: D6354076

fbshipit-source-id: d2bee93a1136fb07c46942649e90110d2e3ccb0e
2017-11-17 12:51:07 -08:00
931bd87e98 Add setup.py
Summary:
Take environment variable CMAKE_ARGS as extra cmake flags

tested `pip install` `pip wheel` `python setup.py build` `python setup.py install`
Closes https://github.com/caffe2/caffe2/pull/1480

Reviewed By: Yangqing

Differential Revision: D6347062

Pulled By: bddppq

fbshipit-source-id: 5806c3a50826c6936e82a64884db1fd7db142097
2017-11-17 12:22:52 -08:00
c4b0db5079 Remove hard file offset reset in load() (#3695)
* improved file offset logic

* load offset test

* whitespace

* needless exception handling

* test integer in binary
2017-11-17 15:21:37 -05:00
2453bc2876 Implement clamp using ATen (#3739) 2017-11-17 13:12:36 -05:00
23ca19ae3d Fix GCC 4.8 build 2017-11-17 08:54:28 -08:00
8d321b6cd3 Improve observer framework overhead
Summary:
There were several regressions over time. Looks like the main
one is recent change with having a map which we iterate over for each
operator call. I made some other little optimizations to our Facebook
observer. Overal this seems to cut about 1000ns from an opertor. At a
rate of 36B operators per second this shouldbe about 750 type vi
hosts.

Reviewed By: bwasti

Differential Revision: D6327460

fbshipit-source-id: 119623addbbd575486906959d65603eea8d4f5e6
2017-11-17 08:36:35 -08:00
309b2a0093 Move jit test order from beginning to right before multiprocessing. 2017-11-17 11:21:32 -05:00
689cf7d480 Reduce nondeterminism in test_jit (#3561)
Occasionally Travis builds would fail on these two tests.
It's not entirely clear where this nondeterminism is coming
from.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-17 19:48:59 +08:00
b97dfc8a92 Pretty names: support names set via export or Variable constructor (#3371)
Add (fully opt-in) functionality to support setting pretty names for
nodes in the graph. In particular

- Variable now has a `name` parameter in the constructor
- export now has `input_names` and `export_names` parameters

Nodes that are not named via this mechanism continue to be named
internally with unique integers.

Names have a few rules.

- They must all be unique in the graph.
- They may not be integers (because of potential conflicts with
  internally generated names).
2017-11-16 21:11:34 -05:00
5b4a438563 Implement bmm symbolic (#3681) 2017-11-16 19:57:02 -05:00
1a02e72254 fix missing DPM .values() and .keys() to viewvalues() and viewkeys()
Summary: Reported by SImon Layton from NVIDIA: we had a couple of py3-incompatible expresions in data_parallel_model

Reviewed By: azzolini

Differential Revision: D6349447

fbshipit-source-id: a09feb69396be43296400591a3bfed5b8c370b0d
2017-11-16 16:08:18 -08:00
9bfaec86b0 Use multiplication instead of division in conv group size checks 2017-11-17 00:13:27 +01:00
35295e42c0 Add static asserts to ensure all cuDNN algos are checked 2017-11-17 00:13:27 +01:00
d1fb8fdf03 Improve IODescriptors in JIT arg checking 2017-11-17 00:13:02 +01:00
5b2026e75b Add torch::hash 2017-11-17 00:13:02 +01:00
b909fce358 Make macOS build use ccache via CMAKE_C*_COMPILER
Summary: Closes https://github.com/caffe2/caffe2/pull/1484

Reviewed By: Yangqing

Differential Revision: D6352137

Pulled By: pietern

fbshipit-source-id: f17c7c8cf38e7a4b8e2af60010bdde920f39e7c5
2017-11-16 14:24:54 -08:00
67354da8cd conv2d in autograd C++ (#3702) 2017-11-16 17:17:43 -05:00
24e83acbb9 Enable sampling in evaluation
Reviewed By: chocjy

Differential Revision: D6119768

fbshipit-source-id: c8447326008392df70ab10b04f84223cf6d882b1
2017-11-16 14:03:51 -08:00
cc7f09a372 Add cudaEvent support to the profiler (#3734)
* Add cudaEvent support to the profiler

This adds the ability to record cuda timings using cudaEventRecord
in the profiler. Since it doesn't require nvprof it is easier
to run than the nvprof path.

This also records a thread id for each event, which will make
tracing results easier to understand

* Add flow arrows from cpu to cuda event

* Fix no cuda build

* Review comments

* Move CUDA checks to one place
2017-11-16 13:58:09 -08:00
ce3413549f check cloned observer in RNN Executor
Summary: Ensure the clone() function didn't return a nullptr before attaching to an RNN operator

Reviewed By: salexspb

Differential Revision: D6341735

fbshipit-source-id: acf89c32f8dae2fd9bc8cb1029bc00df5dbe9dbd
2017-11-16 13:30:53 -08:00
127a55ae49 cast op for empty batch
Summary: Cast op cuda can deal with empty batch now.

Reviewed By: azzolini

Differential Revision: D6350138

fbshipit-source-id: 2f3d19f4d42ff34806aa9597690e66f6b4de1a6b
2017-11-16 12:20:20 -08:00
8f5c0f9678 Record stack traces for CppOps (#3727) 2017-11-16 14:49:01 -05:00
fd7e227826 Re-enable mkl_sbn_op_test.py
Summary: Closes https://github.com/caffe2/caffe2/pull/1478

Reviewed By: Yangqing

Differential Revision: D6349938

Pulled By: pietern

fbshipit-source-id: d7f42c928f75a7d93318f5e622f8fcb0efab015e
2017-11-16 11:45:32 -08:00
066f26c7fa [ATen] Introduce templatized native functions and implement is_signed. 2017-11-16 11:34:34 -05:00
d8dfaeeef7 Add batch-based/row-based sparse from/to dense operator
Summary:
Two ops: BatchSparseToDenseOp and DenseToBatchSparseOp
Inverse operations of each other.

Details are described in op Doc

These op is used along with flexible topK, where the output is
lengths, indices, and values.
We want to do softmax on the values, but the dimension of each batch is different. So these op will convert sparse representation to dense and vice versa. The two ops are also gradient op for each other.

Reviewed By: chocjy

Differential Revision: D6288338

fbshipit-source-id: 0ba9e611058b39e46e7414dcc5f39cab29915fa3
2017-11-16 00:59:21 -08:00
6eca9e052d Fix symbolic for Embedding and Upsampling and improve error messages 2017-11-16 00:38:10 -08:00
3bde37fbf0 Listwise Ranking -- LambdaNDCG
Summary:
This is part one: It adds lambdaNDCG loss which can be used to heuristically
optimize the NDCG metric.

Differential Revision: D5830650

fbshipit-source-id: 1eb696337c9a77727ad40219c68f6468e2e097a5
2017-11-16 00:05:48 -08:00
2502ac082b [ATen] Rename isDistributed -> is_distributed. 2017-11-15 18:33:07 -08:00
6dee02923c [ATen] Rename isSparse -> is_sparse. 2017-11-15 18:33:07 -08:00
9a2b54e08b [ATen] Rename isCuda -> is_cuda. 2017-11-15 18:33:07 -08:00
50d6d258a3 Fix build breakage in ATen NativeFunction (#3729) 2017-11-15 21:22:06 -05:00
a3afca6fc9 Minor documentation fix in NetBuiler
Summary: Came across this bug in doc when I was figuring out NetBuilder form the code.

Reviewed By: volkhin

Differential Revision: D6341821

fbshipit-source-id: 8818f3d92681366bfe7b90d9d4da9f68ef6e4672
2017-11-15 16:22:22 -08:00
9f9d7e6ee7 update gloo submodule (#3728) 2017-11-15 18:47:02 -05:00
983872899e Linear and constant warmup learning rate policies
Summary: Implement LinearWarmup and ConstantWarmup learning rate policies. LinearWarmup warms up the learning rate from (starting_multiplier * learning_rate) to the specified learning rate over the first 'num_iter' steps. ConstantWarmup scales the learning rate by 'multiplier' for the first 'num_iter' steps.

Differential Revision: D6316038

fbshipit-source-id: 1649c3ecd78bcdfec93b6cf195d86328393a7cb4
2017-11-15 15:23:08 -08:00
b96976fceb Use ATen equivalents for variable element_size and nelement. (#3724)
* Use aten numel for variable nelement.

* Use ATen elementSizeInBytes for element_size.
2017-11-15 17:54:02 -05:00
39f0859749 Use ccache for macOS builds if present
Summary: Closes https://github.com/caffe2/caffe2/pull/1475

Reviewed By: Yangqing

Differential Revision: D6340034

Pulled By: pietern

fbshipit-source-id: a932b8b2fd6f94215162b1f15f8f3ea640f542be
2017-11-15 14:38:36 -08:00
e4bb22ebbf bump ios-cmake
Summary:
For #1475 .
Closes https://github.com/caffe2/caffe2/pull/1477

Differential Revision: D6337441

Pulled By: Yangqing

fbshipit-source-id: 80b786e5d1989b53751751cf873d835ad16a1dd7
2017-11-15 13:49:44 -08:00
99037d627d fix OSX cuda build (#3722) 2017-11-15 16:38:18 -05:00
a8d99c145b Move quant_decomp_zstd.* to share/contrib
Summary: Move quant_decomp_zstd.* to share/contrib so that they're automatically synced to fbcode

Reviewed By: Yangqing

Differential Revision: D6336968

fbshipit-source-id: 1bf48ce97a017ddea8cc82865428a498653d5872
2017-11-15 13:18:09 -08:00
067bc141c3 Cached reader
Summary: a wrapper around reader with persistent file cache.

Reviewed By: kennyhorror

Differential Revision: D6257639

fbshipit-source-id: 113296173ca18d25b86e188e0c09e3dbd830969d
2017-11-15 12:38:49 -08:00
2b5a38b1a8 Add missing trtrs, orgqr, ormqr docs (#3720)
* trtrs docs

* orgqr and ormqr docs
2017-11-15 15:37:34 -05:00
b09d66e60d Fix a reference cycle when in-place ops on views save the output (#3679)
Previously, an in-place operation that saves its output (such as
relu/threshold) would create a reference cycle when applied to the a
view. There were two cycles created:

1) The cycle base.grad_fn.fn.input_.base
   base.grad_fn is a CopySlices
   base.grad_fn.fn is ThresholdBackward
   base.grad_fn.fn.input_ is a SavedVariable with base pointing to base

2) The cycle base.grad_fn.fn.input_.grad_fn.next_functions[0]
   base.grad_fn.fn.input_.grad_fn is AsStridedBackward
   and next_functions[0] points to base.grad_fn

Generally, we avoid cycles because the AD graph is mostly immutable. Two
notable exceptions are:

a) Variable.grad_fn can change to point to a new grad_fn
b) SavedVariables in a function can be set after the function is created

The first case is not a problem if grad_fns do not hold strong references
to Variables. Removing "base" from SavedVariable removes the strong ref.

For the second case, we need to avoid saving the grad_fn of outputs. We
were incorrectly saving the grad_fns of outputs when they were the
result of in-place ops on views.
2017-11-15 15:19:41 -05:00
2300234c9c Lint checks, small fixes 2017-11-15 11:47:18 -08:00
ef4b19f767 Refactor ir.h to distinguish Nodes and Values
This commit adds a Value type similar to the one @ezyang suggested a while
ago for handling multi-return nodes.

Previously if we had a graph like:

  a = op1(b)
  c, d = op2(a)

Then its in-memory format would look like:

  %0 = op1(b)
  %1 = op2(%0)
  %2 = select(%1, 0)
  %2 = select(%1, 1)

Select nodes were used only to handle the multi-output case. In the
single-output case ops referred directly to their uses.

This required special handling for the single- and multi- output cases,
and was confusing when used with ONNX which distinguishes values (the
inputs/outputs of a node) from the nodes themselves (e.g. a Conv).

This commit adds the Node/Value distinction to the IR. In the example
above, `a`, `b`, `c`, and `d` are now Value objects, while `op1` and
`op2` are now Node objects. Inputs/Outputs to the graph are values.

* Nodes now always have multiple outputs, accessible through their `output()`
  method.
* Methods exist for adding/removing outputs from a node.
* Nodes own their output Values, destroying a node destroys its outputs and it
is only valid to destroy a node when no uses of its outputs remain.
* Unlike select, Values do not appear in the nodes list.
* The method `node()` on `Value` retrieves its defining node. Calling it
is always valid. For inputs, its kind is "Param". Like "Return" there is a single Param
node representing all inputs.
* For single-output Nodes, the method `output()` retrieves the single
output Value, asserting that the node is in-fact single output.
* Functions are the same, but some functions like `type()` have moved to
Value.
* `replaceAllUsesWith` is now sanely defined for both Values and Nodes.
In the case of Nodes, it replaces all outputs of the node with the outputs
of the replacement node.
* stage is defined both on Node/Value. This is because Inputs require a stage.
* Apart from changing data types from Node->Value most passes remain the same.
  Things that previously assumed single-output nodes now have to call output()
  to get the node.
* This removes the uses = [...] field in the outputs because it was
getting confusing even before this commit when uses would refer to nodes,
but we print the names of Values. The lint pass validates the use list,
so printing it out seems less necessary.
2017-11-15 11:47:18 -08:00
feb0a145c3 Move Variable.var and Variable.std to ATen (#3704) 2017-11-15 14:36:15 -05:00
445cc1f5b9 NativeFunctions: support backend-specific dispatch and SpatialRoIPooling (#3672)
* Support [output] in native_parse.

* allow specifying [output] in NativeFunctions.

Limitation: doesn't work for method, functions; can only do one or the other.

* Sample native function with output.

* spatial roi pooling forward skeleton (note, build is broken after this commit)

* Support multiple variants in native functions with outputs.

* add roi pooling forward cpu

* Add support for tuple return in NativeFunctions.

* native functions cuda

* fix bug in roi pool cpu forward

* finish forward kernel minus invocation

* add option for getting current stream

* Support backend-specific native function dispatch.

* Move cuda stuff to native.

* Move native related files to /native.

* Get rid of NativeFucntionsCuda.h.

* launch forward kernel

* roipool backward kernel

* Rebase expand error message changes.

* Fix up header files.

* add backward kernel launch, write as native function

* Default to base dispatch.

* Re-arrnage native_parse.py.

* Get rid of tabs.

* Get rid of at:: in C++ code in native function decl.

* Parse name.

* Parse name and return.

* Parse arguments.

* Don't specify variants.

* Get rid of /NativeFunction.

* Infer dispatch level.

* Infer dispatch.

* Improve argument parser.

* Comment, simplify parsing.

* Allow single line comments.

* Parse 'const Tensor &foo' correctly.

* Add comment to native_get_return_types.

* Fix python2 build by removing kwarg to rsplit.

* tabs --> spaces in roi foward cpu

* rename to RoiPooling2d

* add _cpu to roi pooling functions on cpu

* fix name handling in native functions

* Fix lint.

* Simplify default handling.

* Get rid of dispatch_level; infer it from dispatch.

* Simplify multiple return type native parsing.

* Move naming of outputs to gen.py from gen_variable_type.

* Get rid of m_ for type methods; keep only method_prefix_derived for s_ functions.

* add derivatives.yaml entry for roi pool

* Native functions parsed from yaml.

* Add comment explaining native_functions.yaml.

* Fix runtime_error string format.
2017-11-15 10:24:51 -05:00
737aba3fc5 Fix cmake scripts for CUDA and MSVC (#3713)
* Fix wrong CUDA generators and allow for new ones

* Fix CUDA detection for other generators

* Simplify the changed code

* Remove useless flags for MSVC
2017-11-15 09:38:36 -05:00
2bc71d4135 Forward args to .jenkins/build.sh to cmake
Summary:
So we can do things like pass -DCMAKE_BUILD_TYPE=DEBUG
Closes https://github.com/caffe2/caffe2/pull/1474

Differential Revision: D6334701

Pulled By: pietern

fbshipit-source-id: 08e6e48ba453ffca50ad0949ee7b0bf7251a542f
2017-11-15 01:05:48 -08:00
2bf4dec9ff Add missing CMakeFile in caffe2/observers
Summary:
Broke by 43075b779b
Closes https://github.com/caffe2/caffe2/pull/1473

Differential Revision: D6333460

Pulled By: dzhulgakov

fbshipit-source-id: 94a06b53650b02ff5938367896b52f47cdbf811a
2017-11-14 21:46:57 -08:00
65a1dbc93d penalty for EOS successor
Summary: Current beam search generates successor states to EOS which are considered for inclusion in the beam even though they do not represent valid sequence prefixes. This diff introduces a penalty to ensure that such states are not included in the beam.

Reviewed By: xliilx

Differential Revision: D6325511

fbshipit-source-id: b17f10b0d00f3bc5fcc5a826a8a57a0f2cb360a6
2017-11-14 21:46:56 -08:00
2792de0d22 Revert D6331513: [caffe2][test] Fix NetTest
Summary:
This reverts commit b9e8ec9afc110b0284550c4818bde15ae108fa2f

bypass-lint

Differential Revision: D6331513

fbshipit-source-id: f24cb46fbcbcdbea2523297c567b08ceaaa93ea6
2017-11-14 21:33:12 -08:00
3bb2308a89 Minor JIT improvements (#3703)
* Record autograd profiler events in JIT

* Fix the graph fuser

It was supposed to only work for float inputs, but worked
for all types _except_ float.
2017-11-14 21:23:31 -08:00
e73228b73c Opensource styler_ops, norm_planar_yuv_op, and quant_ops
Reviewed By: Yangqing

Differential Revision: D6086149

fbshipit-source-id: ac1fb711c4f51091fdadf8e348abb127cf6bc245
2017-11-14 20:47:41 -08:00
80c3f8fa88 Fix NetTest
Summary: Split into cpu and gpu parts, update chaining test

Reviewed By: Yangqing

Differential Revision: D6331513

fbshipit-source-id: b9e8ec9afc110b0284550c4818bde15ae108fa2f
2017-11-14 19:18:28 -08:00
1c1519d7cf Fix export for recent changes in ONNX (#3708) 2017-11-14 21:46:53 -05:00
f0306c12ff add Mean Pooling distributed support
Reviewed By: dragonxlwang

Differential Revision: D6114111

fbshipit-source-id: bc0a79a4455e490bdfaa1d5d6d77badfacd2375c
2017-11-14 17:30:31 -08:00
74367755f2 Integrated GRU implementation into C2
Summary:
Fixed unit test failures for GRU cell first implemented in D5778202

- GRUCell implementation added to rnn_cell.py
- GRU with recurrent attention test added to seq2seq_model_caffe2.py
- seq2seq_rnn.py
    - Added specific behavior for 'gru' cell type
        - in LSTMWithAttentionDecoder, output_indices fix for GRU cells
        - in build_initial_rnn_decoder_states, don't process cell state for GRU cells

Reviewed By: salexspb

Differential Revision: D6316441

fbshipit-source-id: 18668f3db62245c5cdaf3bfa473a40e0feba0473
2017-11-14 16:18:50 -08:00
5e9b445d38 Implement VariableType::alias (#3707)
This isn't exposed to Python directly. Instead Python code can use
variable[:], which will be implemented in terms of alias.
2017-11-14 19:14:12 -05:00
b888d3ac2b Implement toBackend and toScalarType on VariableType (#3706) 2017-11-14 18:30:16 -05:00
c77f0cb5e6 Attach observers to operators inside step net
Summary: Pass the list of observers to rnnExecutor_ and attach them to operators

Reviewed By: akyrola

Differential Revision: D6279655

fbshipit-source-id: 086dde1bf6edbfb36082d6b4de33ec41f0bbefab
2017-11-14 15:06:38 -08:00
1d198c4f8c Use ATen for Variable.contiguous() (#3701) 2017-11-14 17:13:15 -05:00
f756d9d45b Turn off omp by default
Summary:
It used to be that we do quite a bit of #pragma omp parallel, but now it is pretty rare:

https://github.com/caffe2/caffe2/search?utf8=%E2%9C%93&q=pragma+omp&type=

As a result we should probably turn it off by default.
Closes https://github.com/caffe2/caffe2/pull/1472

Reviewed By: bwasti

Differential Revision: D6327459

Pulled By: Yangqing

fbshipit-source-id: e304a85312bc2eb1e7cfe661373f873bffb2fb90
2017-11-14 13:17:49 -08:00
b431526dbe Disable protobuf libprotoc and protoc build for cross compilation.
Summary:
Also bumped third_party/protobuf to v3.4.1 similar to #1462 . cc pietern
Closes https://github.com/caffe2/caffe2/pull/1466

Reviewed By: pietern

Differential Revision: D6322210

Pulled By: Yangqing

fbshipit-source-id: 00f72472b71d1903a2705daf56652e4fb3fc021e
2017-11-14 12:06:02 -08:00
d478ece11e Propagate is_volatile to the base when performing in-place ops on views (#3680)
Previously, an in-place operation on a view that caused the view to be
volatile would not propagate up to the base. This often happens in
backward passes involving CopySlices which would increase memory usage
by making grad non-volatile.
2017-11-14 14:42:06 -05:00
0e522853bf fix half uniform for cuda 7.5 2017-11-14 11:35:15 -08:00
47ac468504 Remove dilations for pooling in onnx export and other small fixes (#3698)
* fix optimization pass issues

* remove pool dilations
2017-11-14 14:28:05 -05:00
f779f44c89 Add ONNX exporter for glcgan
Summary: Export PyTorch glcgan model to Caffe2 using ONNX

Reviewed By: dzhulgakov

Differential Revision: D6298765

fbshipit-source-id: 324e52249bb88c6e7bb3b682a4ec0662b6a0c1ea
2017-11-14 10:09:44 -08:00
9cb8b43778 Split off in-place NN functions (#3683)
For example, this splits threshold into threshold(), which is now
never in-place, and threshold_() which is always in-place.

This simplifies the in-place vs. non-in-place logic in
gen_variable_type.py, which was bug-prone.
2017-11-14 12:59:06 -05:00
1ab3fd1a29 Fix Batched Matmul test accuracy
Summary:
Datatypes was being handled badly in reference check, causing sporadic fails in CI. All batched mat-mul with fp16 data is performed as pseudo-fp16, with all math in fp32. Adjusted the reference implementation to reflect this.

Adjusted the gradient check threshold to the best I could get to consistently pass.
Closes https://github.com/caffe2/caffe2/pull/1406

Differential Revision: D6324431

Pulled By: pietern

fbshipit-source-id: 83ff2584438a11f7a6db4599a4fb0e75e9e15a3d
2017-11-14 09:31:18 -08:00
7605d196fe Hotfix for ONNX BatchNorm export (#3691) 2017-11-14 09:58:02 -05:00
589ce4dfab set CC and CXX only when it's empty
In that way, we can use sth like clcache to speed up builds.
2017-11-14 15:16:21 +01:00
446f869a0d Support negative dimensions in softmax and log_softmax
Fixes #3677
2017-11-14 14:53:31 +01:00
ba3b79b06b Fix the missing import 2017-11-14 09:36:43 +01:00
b8f670eae8 Fix windows build error
Summary:
TSIA. Verified on local machine with VS 2017.
Closes https://github.com/caffe2/caffe2/pull/1455

Differential Revision: D6310658

Pulled By: Yangqing

fbshipit-source-id: 88f4519e8e9a4178719a5627365267f627dcb939
2017-11-14 00:05:33 -08:00
a3bf06c0c7 Use ATen implementations for is_contiguous, is_set_to, numel, get_device. 2017-11-14 08:29:55 +01:00
c5b2c13433 fix error in NegateGradientOp
Summary: remove unnamed namespace in .cc, to avoid error CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_NegateGradientEv

Reviewed By: dragonxlwang

Differential Revision: D6322332

fbshipit-source-id: af4e859761b6235dfb5c8b91a902262c2c775ad7
2017-11-13 22:47:09 -08:00
c5bcd5560c Adding zstd to build
Summary:
This is in order for us to share compression ops to oss.
Closes https://github.com/caffe2/caffe2/pull/1463

Reviewed By: hlu1

Differential Revision: D6319101

Pulled By: Yangqing

fbshipit-source-id: 16c94e71fc3efe256054a648170aaf7702e5bcfe
2017-11-13 22:18:44 -08:00
e43ff32192 Add a JIT interpreter (#3634)
* Add a JIT interpreter

The separate interpreter is used to graphs with a lower overhead than
converting them to autograd graphs. Some notes:

* does not support Handles/PythonOp/CppOp, these will be in a future commit
* jit_closure.cpp still exists and we fall back to it for now when
  cannot handle something because of PythonOp/CppOp
* In order to support retain_graph=True, the interpreter can be cloned,
  creating a copy that can be run with different arguments. This is
  assumed to be the non-standard case so cloning is not particularly optimized.
  No tensor _data_ is copied, but the at::Tensor list in the interpreter is.
  If we hit problems, there is a lot we could do (such as register allocation)
  to minimize the stuff that needs to be copied.
* Uses a pImpl pattern to keep implementation details out of its header file.
* Modifies the way getTensorOp works so that it reads/writes to already-existing
  vectors, this prevents needing to realloc these buffers each time.
* Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127
  This reduces overhead to about the same as running it in python.
  It is about 10us faster to run the same thing using ATen directly.

* Code Mod

Interpreter -> InterpreterState
Function -> Code

Add other requested comments.

* RegList -> ListHandle<T>

Change the RegList functions to be safer by identifying the type of
each argument list, and checking that list insert does not try
to add to two different lists at once.

* Use exactly equal for interp tests
2017-11-13 22:09:53 -08:00
b67acd2d39 Move detach to variable (#3676)
* Move detach to variable

* Move to autograd.cpp
2017-11-13 22:44:23 -05:00
283ca417cc Fix ssize_t for MSVC (#3686) 2017-11-13 22:43:23 -05:00
83b36175fc Fix CUDA 9 builds for Windows (#3684)
* Fix CUDA 9 builds for Windows

* Add msvc conditional flag

* minor bug fix

* minor bugs #1
2017-11-13 22:43:00 -05:00
70b8f0ed47 Fix elu double-backwards when applied in-place (#3687)
* Fix elu double-backwards when applied in-place

Removed unused "input" argument to elu_backwards. Also removed 'inplace'
argument from backwards functions, since we don't ever want to use it.

* Fix up additional calls to ELU_updateGradInput
2017-11-13 22:41:16 -05:00
8701a2dfa3 Allow negative indices in Concat/Split ops
Summary: Closes https://github.com/caffe2/caffe2/pull/1440

Reviewed By: dzhulgakov

Differential Revision: D6290009

Pulled By: jamesr66a

fbshipit-source-id: 93eaff6103211ff89ed63ecaf4aa96d38e6bed63
2017-11-13 18:32:24 -08:00
5d61b1f559 Update ATen operator in C2
Summary: Update ATen operator to new version of aten library. This adds support for many neural network functions that previously were not exposed. This also supports operators that take a list of tensor inputs or produce a list of outputs by appending them to the end of the input/output lists.

Reviewed By: jamesr66a

Differential Revision: D6267327

fbshipit-source-id: 0df6af18369241afa8600fd51923811749900c2e
2017-11-13 18:18:41 -08:00
7b047c161d NegateGradientOp and test
Summary: add NegateGradientOp: in forward pass, this op simply copies the input to output. In backward pass, it flips the sign of gradients.

Reviewed By: dragonxlwang

Differential Revision: D6314456

fbshipit-source-id: 56afd8b131eff9f7e120ab7e4e87461df49649d4
2017-11-13 18:05:14 -08:00
4847f8c191 Remove unused field in tensor proto
Summary: This new field is not needed anymore, so this diff removes it

Reviewed By: kennyhorror

Differential Revision: D6316744

fbshipit-source-id: f8afc1c42a0592fd03c7939f8e6f78afc8510ec9
2017-11-13 17:25:15 -08:00
30068b5b64 Fix function signature in ATenOp for at::Half Set function
Summary:
c777be07d9 changed the type signature for the Set function, this fixes it for the ATenOp
Closes https://github.com/caffe2/caffe2/pull/1464

Reviewed By: zdevito

Differential Revision: D6317561

Pulled By: jamesr66a

fbshipit-source-id: e54d553f44ccf0d5fc695e14dc671dde77004b54
2017-11-13 16:03:46 -08:00
564efd3521 Allow 1->N broadcasts at the beginning and end to be fused (#3616)
* Allow 1->N broadcasts at the beginning and end to be fused

* Update comments and size logic
2017-11-13 15:37:48 -08:00
c2ea3f66b3 Make a concrete function for device_option equality
Summary: Currently, the device_option equality is done in a specialized private function. Ideally, we should be able to test the equality from other places in the code and have a more detailed check for the equality.

Reviewed By: akyrola

Differential Revision: D6316608

fbshipit-source-id: c3fd085583e535d7936d05e4c8b15d2eff91c744
2017-11-13 15:17:06 -08:00
31e9ceeb4b Refactor the observer code to use one function to report both net and operator
Summary: There is no need to use two functions to report net and operators. One function is sufficient.

Reviewed By: Maratyszcza

Differential Revision: D6228730

fbshipit-source-id: c599527254f4a15a3e440d37055cc95fbb3436bb
2017-11-13 15:03:47 -08:00
f600056f48 Allow build with CUDA 9.0
Summary:
This correctly adds handling of CUDA 8.0 and 9.0 by cmake.

**Discussion:**
CUDA 9.0 is currently not handled by cmake. When trying to build
with it and gcc6, the following cmake error is shown:

-- CUDA detected: 9.0
...
CMake Error at cmake/Dependencies.cmake:332 (message):
  CUDA 8.0 is not compatible with GCC version >= 6.  Use the following option
  to use another version (for example):
    -DCUDA_HOST_COMPILER=/usr/bin/gcc-5
Closes https://github.com/caffe2/caffe2/pull/1392

Differential Revision: D6317033

Pulled By: pietern

fbshipit-source-id: 08b89f21b994af52533d5afaaa62f26e2e94aee8
2017-11-13 14:33:31 -08:00
e8abfd359a Limit this fix to apple clang only
Summary:
Use "__apple_build_version__" macro to distinguish Apple's Clang while brew installed LLVM will compile caffe2 without trouble.
Closes https://github.com/caffe2/caffe2/pull/1461

Differential Revision: D6316861

Pulled By: Yangqing

fbshipit-source-id: f7a08cdd8822b197a93aa11dc8f28ef5cd738eee
2017-11-13 14:33:30 -08:00
e9cc41885e fix dynamic memory management for distributed execution
Summary: Dynamic memory management in Data Parallel Model was broken for distributed computation because it also the parameter gradients where freed after been used. That is problem with GLOO because it expects the tensors to have the same address over multiple calls. It is not a huge loss to remove parameter gradients from recycling as they are relatively small for typical convnets.

Reviewed By: asaadaldien

Differential Revision: D6314095

fbshipit-source-id: 949161d8c592927ae2fa82b3262b5f9ee47bed6f
2017-11-13 12:09:11 -08:00
97e4743aaf Caffe2_benchmark can benchmark multiple backend engines
Summary:
Support the default, nnpack, and opengl back end engines. There is no need to change the model. The file would convert the model to appropriate backend.
Closes https://github.com/caffe2/caffe2/pull/1436

Reviewed By: hlu1

Differential Revision: D6275975

Pulled By: sf-wind

fbshipit-source-id: fbd864e18f00372b4c03de294c22383c405a9210
2017-11-13 12:09:09 -08:00
4d152ab931 disable sbn running mean and var comp
Summary:
cc pietern
Closes https://github.com/caffe2/caffe2/pull/1454

Differential Revision: D6310656

Pulled By: Yangqing

fbshipit-source-id: fa9a1e44b6289eb59e0388325c39b11d3b3e3ad4
2017-11-13 12:09:08 -08:00
667c7d980b Avoid misleading message about NODE_ID blob
Summary: Currently, in the single machine execution, a misleading message is printed to the log that the 'NODE_ID' blob is not found. This diff ensures that this message is not spitting out anymore while maintaining the semantics.

Reviewed By: Maratyszcza

Differential Revision: D6302728

fbshipit-source-id: 0f45245aedf6d4f664368595f7894e0f695e5323
2017-11-13 12:09:06 -08:00
8ce205069a Fix stats resporter in calulating STDDEV
Summary:
The STDDEV calculation code assumes that the `compare_exchange` returns the value of the atomic, while the C++ spec actually returns `bool`.

Also, the diff puts enough guard to avoid math error on python side -- although this should not happen, the guard is just to avoid problems with floating point calculation offsets.

Differential Revision: D6307930

fbshipit-source-id: d1754afb631f937aca7a88a82b5be2dd0c704aec
2017-11-13 12:09:06 -08:00
2be44ab242 remove redundant "template" keyword
Summary: remove redundant "template" keyword

Reviewed By: pietern

Differential Revision: D6304205

fbshipit-source-id: cb15b784cc8954a7679ea1e12dda866e9fa86231
2017-11-13 12:09:03 -08:00
814cd7ade3 Fix event test
Summary: Closes https://github.com/caffe2/caffe2/pull/1452

Reviewed By: pietern

Differential Revision: D6304147

Pulled By: ilia-cher

fbshipit-source-id: ed838d675689a47c8d1831926ab18dbca063ca08
2017-11-13 12:09:02 -08:00
fec5631513 Updated nnpack code. original author is @Maratyszcza 2017-11-13 11:28:15 -08:00
c7cb6a795e Record stack traces during JIT tracing (#3607)
* Update comments and size logic

* Record stack traces during JIT tracing

* Use string helper functions and AutoGIL

* Use SourceLocation object instead of storing in debugName

* Address zdevito comments

* Address comments
2017-11-13 10:18:55 -08:00
25b166ed1f add depthwise convolution terminology as a note 2017-11-12 23:26:42 -05:00
e33df2b88a Add border-padding for grid_sampler (#3599)
* adds border padding to spatial grid sampler

* fixes flake8 * adds docs
2017-11-12 18:46:49 -05:00
30d06218cb Solved boolean ambiguity for variables and tensors which contain one value. (#3656)
* Solved boolean ambiguity for variables and tensors which contain one value.

* Update variable.py

* Update tensor.py
2017-11-12 11:07:50 -05:00
ea4432b3c2 Fix CUDA builds for Windows (#3650)
* Fix CUDA builds for Windows

1. CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS has a limitation, the maximum number of exported functions cannot exceed 65535. So it can't be used.
2. Specify static on an inline function to prevent linking errors.

* cancel CMAKE version limitation
2017-11-12 00:14:05 -05:00
73431f087b Allow torch.load and torch.save to take pathlib.Path (#3589)
* Allow torch.load to take pathlib.Path

pathlib has been python standard library for filesystem path since python 3.4
But `torch.load` currently cannot take `pathlib.Path` as its filename of state dictionary.
I changed `torch.load` and `_with_file_like` to check so that they can accept `pathlib.Path` typed filepath.

* Fix flake8: too long line & indentation
2017-11-11 18:50:13 -05:00
4fa94793dd Bump version in master (#3605) 2017-11-11 18:49:19 -05:00
2bf70c137e fix selecting deterministic conv algo (#3631) 2017-11-11 11:36:42 -05:00
0443c11f7e Fix for cuDNN half precision RNN for pre-volta archs (#3613)
* Fix for cuDNN half RNN on pre-volta archs

* Fix cuDNN versioning in rnn.

* lint fix
2017-11-11 11:34:58 -05:00
84c618010d Remove redundant dimension check that produced maybe-uninitializd warnings 2017-11-11 13:40:55 +01:00
7160fb0801 Fix setup scripts for Windows CUDA builds 2017-11-11 13:05:35 +01:00
95821ca4e5 fix USE_BLAS detection in THGeneral.h.in (#3632) 2017-11-10 14:13:30 -08:00
1a58775e19 Fix AppVeyor Windows build due to template chaining
Summary:
The windows compiler has a bug with chained templates. This diff avoids using such pattern in `plan_executor.cc`.
Closes https://github.com/caffe2/caffe2/pull/1442

Reviewed By: Yangqing

Differential Revision: D6300046

Pulled By: heslami

fbshipit-source-id: 1dc74441d6e2f0586c636e799eb5e88ced289063
2017-11-10 14:08:30 -08:00
0483304fab Enable EXPORT_ALL_SYMBOLS for CMAKE (#3617)
* Enable EXPORT_ALL_SYMBOLS for CMAKE

If we turn on CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS flag, we don't need to add most decorators by hand.

* Add quotation marks to pass the string args

* added endif

* Update CMakeLists.txt
2017-11-10 17:00:47 -05:00
ae5673741b add option to do simple modulo
Summary: as desc.

Differential Revision: D6240061

fbshipit-source-id: 814a541a3e7f09ebbe2df63fd9202312e9f4c8d4
2017-11-10 13:49:07 -08:00
fc8532c89d Allow serialization of custom types inside Tensor
Summary:
The use case is that sometimes we need a Tensor of custom type instead of POD
or string. This diff allows one to delegate to BlobSerializerBase to further
serialize the contents inside the Tensor.

Design choices:
(1) Each element is serialized as a BlobProto string, and stored in the
repeated string field.
(2) UNDEFINED is used as the enum value for the tensor data type, and the exact
type string is stored in the additional field.
(3) BlobSerializer is called on each item to obtain the serialized string.
(4) This requires the custom type to have copy constructor - otherwise it
will simply not be possible to copy over the deserialized content without
explicit type.

See blob_test.cc for an example.

Reviewed By: sunnieshang

Differential Revision: D6300196

fbshipit-source-id: 18bf94a22a07337e0fa83d3f1004b3651e38cf27
2017-11-10 13:14:21 -08:00
efe4386d24 Fix module load_state_dict error information. 2017-11-10 22:11:30 +01:00
cc8fd5bde1 added #define __STDC_FORMAT_MACROS to tensor and storage code templates to avoid problems with gcc 4.8.5 (#3629) 2017-11-10 15:21:33 -05:00
c04ec84e1a disable uniform fill large blob
Reviewed By: pietern

Differential Revision: D6299413

fbshipit-source-id: 2ea4a5f1434060c3ab6fd42abd4052bdb10a37cc
2017-11-10 12:10:14 -08:00
3a6b38eb2c Avoid unsupported version pinning for HomeBrew on CI
Summary: Closes https://github.com/caffe2/caffe2/pull/1451

Differential Revision: D6298975

Pulled By: Maratyszcza

fbshipit-source-id: 5b8a592748b400ca8ba4df089a8cdf886b6c0cf6
2017-11-10 11:20:43 -08:00
4971aec81e Add /usr/local/opt/python/libexec/bin to $PATH on Mac travis
Summary:
This should Travis the build failures on Mac
Closes https://github.com/caffe2/caffe2/pull/1443

Reviewed By: bddppq

Differential Revision: D6295041

Pulled By: Maratyszcza

fbshipit-source-id: c143220e1ec17e49fe8e84f586f9fb82daba321a
2017-11-10 10:23:31 -08:00
0440f3bf93 Reduce caffe2 GPU topk test sizes
Summary: The topk GPU test was taking too much time, but there are still a variety of codepaths to test (k <= 1024, k > 1024, k == 1, k == n). Reduce the batch sizes and n to reduce time taken by the in-python CPU code equivalent.

Reviewed By: pietern

Differential Revision: D6272628

fbshipit-source-id: b8b8f3601f28bf64f144c73d7c9e915f40c84d70
2017-11-10 07:47:00 -08:00
5478d0154f Fix pthread detection for MKL 2017-11-10 07:40:57 -08:00
1f1612ee37 Move _CompiledMixin to C++ 2017-11-10 16:31:44 +01:00
02450fff38 Expend autograd profiler docs (#3621) 2017-11-10 08:58:45 -05:00
7e1d795354 fix for unknown ssize_t in aten/src/TH/THMemoryFile.c (#3612)
* added sys/types.h include to fix unknown ssize_t in aten/src/TH/THMemoryFile.c

* now including <sys/types.h> only if _WIN32 is not #defined

* now including sys/types.h in aten/src/TH/THDiskFile.c (if _WIN32 is not defined) to fix undefined off_t
2017-11-10 08:54:40 -05:00
e8e29690ef Add has_debug_def() check to net's debug_def()
Summary: same as title

Reviewed By: salexspb

Differential Revision: D6264232

fbshipit-source-id: e9f499e0c8758bcb52f079521fa95973fcba441f
2017-11-10 03:24:49 -08:00
d1c73eb407 use size_t for rand fill functions in math
Summary: The number of elements in the caffe2 blob can be larger than int32. Use size_t to prevent overflow.

Reviewed By: ajtulloch

Differential Revision: D6278363

fbshipit-source-id: 356e294c667a53360d8a65b56a63a39d5ce3384e
2017-11-09 18:44:46 -08:00
9b0990539b Hack to detect when only one output is differentiable. 2017-11-10 09:58:40 +08:00
df433d427c Only set_flags on differentiable outputs.
I messed this up and TestNN.test_MaxPool2d_indices caught me out
on it.  This patch assumes that IndexTensor outputs are not
differentiable.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
19515520bb Make prelu an ATen op.
This operator is a warmup I was doing before tackling convolution, as it
has many properties that make it a "first" for implementing things.  In
particular, it is the first operator whose backwards have multiple
returns; this means its double backwards is the first backwards for a
function with multiple differentiable outputs.  This exercises new code
for output_mask and set_flags.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
68b5d94371 Extra sanity checking for derivatives.yaml versus Declaraitons.yaml
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
efb611a134 Fix misnamed generator argument.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
016af4ebf7 Fix parsing problem with std::array<int, 2> (note space)
We are splitting on ', ', but that causes problems when you
have a nested comma.  Quick and dirty fix is to NOT have the
space.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
95ddfbc947 Delete default parameters from derivatives.yaml.
They don't actually do anything and they're not accurate (many functions
have defaults which we didn't specify here).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
dfcd2a73f5 s/thpp/at/
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
690bfc0781 Delete unused defined_if fields.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
9cdc24f550 Make 'name' occur first in Declarations.yaml
Whenever I used to read Declarations.yaml, it would drive me batty that
'name' was always embedded somewhere in the middle of the record.
Now it at the top, as it should be!

What it looks like now:

  - name: storage_offset
    method_prefix: m_
    arguments:
    - dynamic_type: Tensor
      name: self
      type: const Tensor &

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
43e4e3cca2 Some developer notes for ATen.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
2da308f4b9 Add expand_as/type_as to ATen.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
664fb135af More elaborate error message when expand fails.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
8f3bef2292 Add operator<< for at::Type
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-10 09:58:40 +08:00
bfdd864631 Automatically pretranspose FCs in BlackBoxPredictor
Summary:
pretransposing FCs seems to offset loses we get from low
batch sizes in AdIndexer. First I confirmed this on local benchmarks (see
previous diff). Then in https://fburl.com/yuo49onj I showed how this
change saves 19% of FC time on AdIndexer. Which is already $0.4M in
cap. exp. and over 3 years gives 5x more ROI.

We also we reuse this code for later more efficient gemm
implementations. I.e. msmelyan is working on new fp16 gemm which
would cut bandwidth usage 2x. We can reuse code in this diff for
repacking required by a new gemm.

In this diff I had to take care of memory usage. Here are several
possible approaches to the transformation:

1. Perform  on the fly, copy the memory. This is what is done in
skinny gemm (FC with engine SKINNY)

Cons: slow first execution, memory is replicated for each thread

2. Perform copy of weights in operator constructor. On the fly in dbg
mode verify that hash on original weight is the same

Cons: memory is still replicated for each thread

3. Perform copy weights in Predictor constructor

Cons: if we have 2 predictors sharing the same weight blob (via
PredictorContainer), we still get 3x more memory. I.e. original
weights and two copies for each of the predictors in a container

4. Replace weights in Predictor constructor, take care of mapping to
support weight sharing within a Predictor container

This is the approach taken in this diff, it solves issues above and
doensn't create any memory overhead.

Cons: Logic became complex, requires a mutex at initialization time

Reviewed By: akyrola

Differential Revision: D6214593

fbshipit-source-id: 25da6ba7bfd39fc8f4b578094d3f334c7957490d
2017-11-09 17:35:32 -08:00
fe22e3deb9 make summarize op support larger blob and more robust
Summary:
- so that it can also summarize blob of size larger than int
- the calculation of the mean and std may overflow/underflow, change to use double for intermediate calculation

Differential Revision: D6278275

fbshipit-source-id: f0bb72a5279212d429fa6d09b5487cad1baacdbe
2017-11-09 17:02:48 -08:00
7cedf80923 add flexible topK op
Summary:
Will probably rename to adaptive topK to be aligned with the layer name.

The main difference from top_k op is that the K is not fixed as a layer parameter,
instead this op takes in a blob that conatins K information for each row of the input data (batch mode).

Reviewed By: chocjy

Differential Revision: D6221209

fbshipit-source-id: f7fd575ff8f515d886d93278ad94fd17e8bd6fa5
2017-11-09 16:48:14 -08:00
43d1405d0d Fix ld* conditions for gemv ger gemm (#3604) 2017-11-09 19:43:29 -05:00
d496f9b20c Ensure that Variables are at least one-dim in VariableType (#3609)
Previously, we checked that Variables were at least one dimensional in
the Python binding (wrap_outputs.h) and in the backwards functions. This
was necessary because some Tensor functions returned Scalar types, which
must be zero dimensional. This moves the wrapping logic into
VariableType.
2017-11-09 17:34:24 -05:00
0b476e6456 CMake: remove unneeded dependency with OpenBLAS
Summary:
Do not try to link against `libcblas.so` when using the OpenBLAS
back-end. This fixes #763.

I briefly checked the OpenBLAS repository and as far as I can tell, the OpenBLAS build script by build never created a library called _cblas_.
Closes https://github.com/caffe2/caffe2/pull/1420

Differential Revision: D6283019

Pulled By: pietern

fbshipit-source-id: 53cd4455bdc63ee9f31d5bca9822844548350ae3
2017-11-09 14:04:39 -08:00
febe45ebb4 Disable NNPACK build on unsupported CPU architectures
Summary:
Few people complained in NNPACK repo about broken build on PPC64, as it specifically whitelists supported architecture in its CMakeLists.txt, and refuses to build on unsupported platforms. This commit explicitly disables NNPACK build (as part of Caffe2 build) on unsupported architectures.
Closes https://github.com/caffe2/caffe2/pull/1439

Differential Revision: D6288999

Pulled By: Maratyszcza

fbshipit-source-id: 76c40e9ce882356944b63968df8fd853f21ecd35
2017-11-09 13:48:05 -08:00
4b8669b087 Write checkpoint info to XDB at the end of an epoch
Summary: In this diff I am making sure that the checkpoint metadata is written out to the db for every epoch. This will allow us to automatically resume from a epoch if a workflow fails.

Reviewed By: aartibasant

Differential Revision: D6234832

fbshipit-source-id: f09a4de118f2eac25f663556476ac6313925fdf3
2017-11-09 11:13:24 -08:00
1bf717e17d Raise exception when Variable.reinforce is called (#3555)
Fixes #3554
2017-11-09 12:30:12 -05:00
50009144c0 add warnings if device capability is less than ideal (#3601) 2017-11-09 11:48:59 -05:00
12e4af94e8 add better gradient creation error message
Summary: Print the full operator definition when gradient creation fails. This helps debugging cases where same op type is used in many places.

Differential Revision: D6282832

fbshipit-source-id: 4b9dab2602c7c53f795da93a3085cf5c8ca741c1
2017-11-09 08:06:05 -08:00
cc757acd36 docs: clarify the difference between net() and net.forward() (#3596) 2017-11-09 08:16:01 -05:00
dd6d04ddf2 doc: Normalize all true/false in docstrings to `True|False` (#3593)
* doc: Normalize all true/false in docstrings to ``True|False``

This makes them more apparent in the documentation.

* doc: fix flake8
2017-11-09 08:12:29 -05:00
9d4c2d743b Enable the build for MSVC 2017 and Ninja (#3595)
* Add ninja support for Windows MSVC

* Enable prebuild commands for builds

* Fix wrong typing
2017-11-09 08:10:40 -05:00
555c51c846 Fix build failures in MSVC (#3594) 2017-11-09 08:10:00 -05:00
b06c59e543 fix warnings about _XOPEN_SOURCE redefinition. Every compilation unit whose headers recursively include Python.h need to include Python.h first. This is a known limitation of the Python headers. 2017-11-09 09:21:30 +01:00
0217ad29d2 Fix OS X build, fixes #3573
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-09 16:16:16 +08:00
25d3c25f50 add more fusable nodes to the graph compiler (#3559) 2017-11-08 22:58:08 -05:00
285ce10dbe fix linking order of nvrtc to force no-as-needed (#3583) 2017-11-08 22:05:09 -05:00
bf5932fb15 Add missing documentation for replacement in WeightedRandomSampler (#3579)
* Update sampler.py

* fix lint
2017-11-08 20:23:42 -05:00
d2784b6e5b Link ATen against CuDNN when available. (#3582)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 20:20:53 -05:00
ec389f5128 Fix cuda symeig (#3566)
* Fix cuda symeig

* Add symeig test

* Better check for magma
2017-11-08 20:20:14 -05:00
aabfae0503 CPU all/any should work with empty tensors. (#3581) 2017-11-08 20:18:26 -05:00
bcc8c8f696 Support RMSProp in Caffe2.
Summary:
Add `RmsPropOptimizer` to `optimizer.py` so RMSProp can be used as an optimizer.

`RmpsPropOptimizer` uses `RmpPropOp` to update the gradient and `MomentumSGDUpdateOp` to update the model parameters.

Differential Revision: D6118279

fbshipit-source-id: e38b8380ff74c1d1bb1e87fc300b6b55e32cd2e0
2017-11-08 16:43:18 -08:00
adf883b7b1 fix uninitialized warnings in THCUNN. (#3575) 2017-11-08 16:18:45 -08:00
4e3aa25139 Unit test that compares net snippets after parallelization
Summary:
- This is meant as a set of examples on how parallelize_net works.
- Currently, only one example is provided. More to be added.

Reviewed By: mraway, xianjiec

Differential Revision: D6240160

fbshipit-source-id: 6f6f2d77445825883e050498cb6e06fb74508bbf
2017-11-08 15:55:27 -08:00
2bc7fc8698 Add Jenkins build scripts
Summary:
Let's see if we can make this work...
Closes https://github.com/caffe2/caffe2/pull/1417

Differential Revision: D6276601

Pulled By: pietern

fbshipit-source-id: 4d51a66b693a1c5cff1e0c03373cd42bb273c885
2017-11-08 14:47:27 -08:00
547ac8c0b9 Ensure aten build depends on NativeFunctions.h.
Otherwise you can change the metadata and the code won't be re-generated.
2017-11-08 17:39:33 -05:00
0509f401d1 Update ATen to fix issues with old g++ (#3574)
* Update ATen to fix issues with old g++

* Add comments
2017-11-08 17:25:49 -05:00
e6fadfa76e Relaxing checks for fp16 in BatchMatMul tests
Reviewed By: pietern

Differential Revision: D6275557

fbshipit-source-id: e336ba9c897b88801f1be1b32029c5af58ec3fc5
2017-11-08 13:42:28 -08:00
15c523f836 [ATen] Make size/stride native functions.
Previously, sizes/strides() would give you the ATen view of the shape, while size(dim), stride(dim) would give you the TH view.
This was unnecessarily confusing and there was no automatic way to get dim wrapping on the ATen view.
2017-11-08 16:33:42 -05:00
b2bbc7c091 Enable building mobile directory files in OSS
Summary:
The source files are not exposed to the parent directory in mobile. Expose them now so that the files are built in OSS.
Closes https://github.com/caffe2/caffe2/pull/1435

Reviewed By: akyrola

Differential Revision: D6274056

Pulled By: sf-wind

fbshipit-source-id: 6b54645bc9a42b4329d8aa20051abeb5fc6b1c37
2017-11-08 12:34:14 -08:00
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
348e29c49b Don't run CUDA tests for ops without CUDA implementation
Summary: Closes https://github.com/caffe2/caffe2/pull/1434

Reviewed By: houseroad, ilia-cher

Differential Revision: D6272614

Pulled By: pietern

fbshipit-source-id: 7b998b08ec02b03f88a6fd24a949b0d199b2aa37
2017-11-08 10:28:02 -08:00
1d57a2d54c [ATen][Scalars] Remove Scalar from return types of functions. (#3557)
* Add direct C-type scalar conversions from Tensor, e.g. toCFloat() as an alias for Scalar(x).toFloat()

* Provide tensor overloads for fill_, masked_fill_, index_fill_.

* Everythign up to scalar overload.

* Fix pytorch build for aten scalar return type changes.

* Use valid expression instead of dangling else.

* Simplify code generation.

* Fix test_jit (why didn't this compile locally?)
2017-11-08 11:29:56 -05:00
22d1e37540 Have ATen build respect DEBUG variable. 2017-11-08 11:28:30 -05:00
4761b32f96 make use of the average length of sparse features for init
Summary:
Ability to use average length of sparse feature to initialize weights. Based on experiments, it turns out that this allows a model to converge faster.

More results of the experiment -- https://fb.quip.com/VfraAXNFWhSg

Reviewed By: xianjiec

Differential Revision: D6092437

fbshipit-source-id: d979be7d755719ff297b999f73cba0671e267853
2017-11-08 07:31:47 -08:00
e579ae75b5 Fix error when default_collate is passed a collection of numpy.str_ (#3404)
* Fix error when default_collate is passed a collection of numpy.str_

* Error if default_collate input is nested nparray containing non-numbers
2017-11-08 10:02:08 -05:00
be071d767d Fix uniform on CUDA tensor to return in range [0, 1) (#3547)
The curand_uniform function returns the range (0, 1]. Most RNG APIs have
the opposite bounds. Fixup the values in uniform_() so that they fall in
the more common bounds.
2017-11-08 10:00:37 -05:00
9c3cb6e652 Fix stride checks in gemm dispatch (#3548)
From https://software.intel.com/en-us/mkl-developer-reference-fortran-gemm:

 lda: "When transa = 'N' or 'n', then lda must be at least max(1, m),
       otherwise lda must be at least max(1, k)."

 ldb: "When transb = 'N' or 'n', then ldb must be at least max(1, k),
       otherwise ldb must be at least max(1, n)."

Partly addresses #3525
2017-11-08 09:55:25 -05:00
5e382894be add numpy() and from_numpy() to HalfTensor (#2953) 2017-11-08 15:01:29 +01:00
8d2b9a08f4 Some documentation for derivatives.yaml
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 14:57:43 +08:00
fb186c0079 Make atan2 backwards reuse intermediate computation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 14:57:43 +08:00
7747078a89 Support defining gradient for multiple inputs simultaneously in derivatives.yaml
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 14:57:43 +08:00
07d30e9c3f Delete obsolete only_registry entries in cwrap.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 14:57:43 +08:00
d719936b13 Top level comment for gen.py
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-08 14:57:43 +08:00
84b76a0712 fix shape info in concat layer
Summary:
The output shape info is incorrect, e.g. if we have 4 embeddings with dim size 32, the actual shape is (4, 32),
but the previous implementation in concat layer will give us (128, 1). This bug doesn't affect the dot products
calculation because the actual shape of the blob is still (4, 32) in concat_split_op

Differential Revision: D6264793

fbshipit-source-id: 82995e83a8c859cbd15617ff7850a35b30b453b6
2017-11-07 21:08:39 -08:00
daf2743bbb Prevent segfaults from undefined aten tensors (#3482)
* Prevent segfaults from undefined aten tensors.

This introduces a singleton UndefinedTensor TensorImpl with UndefinedType that is the starting state of a Tensor with no constructor arguments.  In this way we avoid null pImpls and avoid segfaults
without having to if-check each pImpl dereference.

* If either Backend or Scalar type is Undefined in registry, return
the UndefinedType to avoid errors like CPUUndefinedType is not enabled.

* Address review comments.

* Avoid refcounting UndefinedTensors.

* Use reference_wrapper to avoid copy in check_defined.

* Declare UndefinedTensor singleton as class-static.

* Seperate checked_cast into storage and tensor versions.

* Include <functional>

* Handle nullptr TensorImpls coming from NN.

* Fix nullptr check in batch_normalization backward with defined check.
2017-11-07 21:28:17 -05:00
c75ab8167d Fix double event record in RNN executor
Summary:
RNN executor uses its own set of events (https://fburl.com/37mows6l) and may
call RunAsync multiple times on the same op.  Disable internal op event for this use case.

Reviewed By: akyrola

Differential Revision: D6258471

fbshipit-source-id: 228f9ca9882cfbac5bc8fba55ddf80bd2b542072
2017-11-07 14:16:45 -08:00
dc10083fc0 Previous PyTorch version info (#3549) 2017-11-07 17:15:33 -05:00
6c1bff4cbc Generate native functions with const ref Tensor arguments. (#3465)
* Generate native functions with const ref Tensor arguments.

This matches the non-native functions and avoids unnecessary ref counts.

* Properly handle inplace functions.

* Return Tensor & for inplace native functions.
2017-11-07 17:07:22 -05:00
bb1b826cdc Exposing emptyCache from allocator (#3518)
* Add empty_cache binding

* cuda.empty_cache document

* update docs
2017-11-07 17:00:38 -05:00
f3c7bb9bc1 avoid unnecessary multiplies in derivatives (#3545) 2017-11-07 16:29:55 -05:00
ecbc4b0dc3 Fix float uniform generation in TH (#3541)
Generate random uniform floats in the range [0, 1) by generating random
uniform uint32 in the range [0, 2^24-1] and dividing by 2^24. This
ensures that the largest value is representable as a float32 less than
one.

This also changes the uniform double generation to use more bits of
randomness.
2017-11-07 16:26:11 -05:00
9b54f8e59c ignore digit in container's __dir__ 2017-11-07 22:08:32 +01:00
5fd93b56fd [master] Don't expose 0-dim tensors to Variable API. 2017-11-07 15:15:42 -05:00
9a020ea2ff Document weights argument format for BCELoss (#3535) 2017-11-07 14:19:46 -05:00
534e8ecc97 fix C_FLAGS typo (#3538) 2017-11-07 13:48:29 -05:00
4587a7686b Make distributions docstring raw (#3539) 2017-11-07 13:47:48 -05:00
00d2befba1 THTensor_varOuterDim numeric stability (#3533) 2017-11-07 13:47:20 -05:00
6fde0cb507 Fix memory leak in THTensor_(addmm) (#3536)
THTensor_(newContiguous) always increments the refcount. It may return
the same pointer if the tensor is always contiguous. Since we added the
check for zero strides, it may be called when the tensor is already
contiguous. We need to make sure that THTensor_(free) is always called
in this case.

Fixes #3498
2017-11-07 12:47:13 -05:00
99907f2eb0 [ppc64le] add -fexceptions to aten build function for C and CXX builds (#3515)
* add -fexceptions to aten build function for C and CXX builds

* add -fexceptions to aten build function for C and CXX builds

* add -fexceptions to aten build function for C and CXX builds

* Fix test_torch.py test for Power see issue #3277
2017-11-07 12:12:41 -05:00
77ddd5130b Add reduce keyword for KLDivLoss (#3330) 2017-11-07 08:57:11 -05:00
db3f5f86b2 Update ONNX IR we emit to version 0.0.2 (attribute discriminators) / fix Permute export (#3484)
* Regenerate ONNX nanopb from latest version.

But don't bump the IR version, we don't handle discriminators
yet.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add discriminator to AttributeProto.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add back ONNX definition for permute

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-07 07:40:28 -05:00
29fc920305 Fix MSVC build after major change (#3467)
* Fix MSVC builds after ATen patch

* Skip isnan and isinf tests for integral types

* Remove additional blank line

* fix wrong template arguments

* using spaces instead of tabs

* Revert to default formatting

* Fix build scripts

* Revert wrong changes
2017-11-07 07:15:58 -05:00
6767db28dc adds flag __CUDA_NO_HALF_OPERATORS__ (#3520)
* adds flag __CUDA_NO_HALF_OPERATORS__

* Update CMakeLists.txt
2017-11-07 07:01:09 -05:00
d2ddbaaf8d Fix command highlight in README (#3521) 2017-11-07 06:50:48 -05:00
6dd87dc88a Merge vestigial Local.cwrap into Declarations.cwrap / remove standalone ATen build logic (#3522)
* Merge vestigial Local.cwrap into Declarations.cwrap

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Remove dead standalone ATen build logic.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-06 19:23:51 -08:00
7d488544d3 Fix leak of workspace buffers
Reviewed By: Yangqing

Differential Revision: D6254401

fbshipit-source-id: 57b6a99b79d79e13e3f9ec666df399918327294a
2017-11-06 18:07:26 -08:00
cbb03b8db8 add modulo operator
Summary: as desc.

Reviewed By: chocjy

Differential Revision: D6240026

fbshipit-source-id: fa4dcccebc44b0a713946823b6f56e73d5d6146b
2017-11-06 16:44:16 -08:00
621fbd5c4e Move flattening/unflattening JIT logic to C 2017-11-06 19:42:44 -05:00
22f596572c Add torch.autograd.profiler.range 2017-11-06 19:42:44 -05:00
68116d7f84 Fix test_torch.py test for Power see issue #3277 (#3517) 2017-11-06 18:51:02 -05:00
e2f33eb6a2 add doc for sparse_adam (#3519) 2017-11-06 18:37:15 -05:00
aa93a3d633 -1 indexing fix in THCApply for pre CUDA9 (#3457)
* THCApply fixes

* THCApply add undef
2017-11-06 18:28:01 -05:00
fde355f7d4 Allow in-place operations on views (#3384)
Allow in-place operations on views

Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to
the base Variable on which it is a view. In-place operations on views
change the grad_fn of the base.

Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception.

Fixes #3313
2017-11-06 18:19:56 -05:00
d6a8d28d65 Simplify ATen Build (#3496)
* THS build change

* merge THCS into ATen build

* THCUNN build change over

* update THNN build

* move THC build to ATen, as well as some of the accumulated top level config from other TH* libraries

* TH library build merged into ATen, and warnings fixes.

* fix magma support checking

* check cuda early

* fall back to GCC atomics if C11 atomics have issues.

* fix install name

* disable openmp in files that also include stdatomic.h

* make sure LAPACK is visible to TH build file.
2017-11-06 17:46:15 -05:00
50a63ee6fd Fix and speed-up norm_backwards (#3481)
Fixes #3264
2017-11-06 17:11:44 -05:00
3d06a1e075 Make THCTensor_varInnermostDim numerically stable using Welford's algorithm (#3425)
* Use Welford's algorithm when reducing along inner dimension for THCTensor's variance fn

* Use accreals in THCTensor's varInnermostDim

* Skip cuda tests if no cuda

* Variance testing
2017-11-06 16:00:29 -05:00
4e5b25ed47 Use ASSERT(...) rather than assert(...) in ATen tests.
Since ATen build no longer relies on NDEBUG from pytorch, this ensures the asserts will still fire.
2017-11-06 14:31:59 -05:00
8fd171a6fd add test_index to test_cuda 2017-11-06 14:21:31 -05:00
0bb0ee883e relax index dim check 2017-11-06 14:21:31 -05:00
f76d6c029c Sparse Adam optimizer for sparse gradients (#3137)
* sparse adam

* Favor dense addition over sparse_mask
2017-11-06 14:20:51 -05:00
c2626f6031 Fix error message for type mismatches with sparse tensors (#3504)
* Fix error messages

* Better fix for error checking
2017-11-06 13:12:40 -05:00
122d884bbf add CMake flag for disabling contrib builds (#3508) 2017-11-06 12:55:54 -05:00
74d1bb54e6 Add single argument version of torch.arange (#3494) 2017-11-06 12:26:04 -05:00
c2bdda1224 implement __dir__for Variable (#3501)
* implement __dir__ for Variable

* Update test_autograd.py
2017-11-06 08:08:15 -05:00
84067bc17d Make RowWiseSparseAdagrad type/shape inference compatible.
Summary:
Current version of the code is not supporting type and shape inference that is
going to make all places that rely on it fail misserably.

I'm still leaving option of doing init in the old way in case if some places
are already failing this inference logic.

Reviewed By: ffjiang

Differential Revision: D6241270

fbshipit-source-id: e9080ffe93d610b5ada58ebe66579acfa57c6b3c
2017-11-06 00:50:44 -08:00
5de7f9e731 Tidy up CUDA notes 2017-11-05 14:42:06 +01:00
5c881f00a0 Add REINFORCE rule to distributions doc 2017-11-04 12:03:13 -04:00
0ce65ede86 Revert D6224054: [xplat] Switch to open-source NNPACK
Summary:
This reverts commit 4dbe02b4da97648a663586414550c2d4e23c7221

bypass-lint

Differential Revision: D6224054

fbshipit-source-id: 6be2e5a129928650ddfe8baa1b309068d90bea69
2017-11-04 00:31:33 -07:00
0b661035f3 pointwise cost function
Summary: start adding some more annotations

Reviewed By: salexspb

Differential Revision: D6180221

fbshipit-source-id: b02157da6b2dfa2064ecab3fad5aaddcc7551253
2017-11-03 22:46:35 -07:00
8cb7e5bd5b Don't assume construction succeeded in __del__.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-04 00:57:08 -04:00
ac099ceda0 Set debug_net_def for NetBase
Summary: Same as title

Reviewed By: salexspb

Differential Revision: D6203094

fbshipit-source-id: 8e57d596b95d3bf71b59f265a58bc61a3b727f5b
2017-11-03 20:55:05 -07:00
1021402136 Compile nnpack and pthreadpool with -fPIC
Summary: Closes https://github.com/caffe2/caffe2/pull/1428

Reviewed By: Maratyszcza

Differential Revision: D6240390

Pulled By: pietern

fbshipit-source-id: 6d441bbfda81ce79e3c824ec28eec0b2cdd8c7cd
2017-11-03 20:19:26 -07:00
fe4e14ed29 Fix fill derivative (#3483) 2017-11-03 23:00:48 -04:00
5616d41421 Switch to open-source NNPACK
Summary:
replaces FB-internal NNPACK fork with open-source version.
Important FB features are already upstreamed to the GitHub repo.

Reviewed By: ajtulloch

Differential Revision: D6224054

fbshipit-source-id: 4dbe02b4da97648a663586414550c2d4e23c7221
2017-11-03 19:01:53 -07:00
2bdea8b451 Add ONNX symbolic for Elu 2017-11-03 20:57:51 -04:00
54972458e1 Build NNPACK and pthreadpool as static libraries
Summary: Closes https://github.com/caffe2/caffe2/pull/1427

Differential Revision: D6239272

Pulled By: Maratyszcza

fbshipit-source-id: 644b588838235b55086413e08bb7f144d924507f
2017-11-03 17:32:35 -07:00
ce62c65c18 momentum sgd
Summary: Add support for SparseMomentumSGDUpdate and tests for momentum SGD in both dense and sparse cases

Reviewed By: akyrola

Differential Revision: D6234834

fbshipit-source-id: 9848c29ea06794ef35f1ebaff0f5e81eac4f4db9
2017-11-03 16:17:17 -07:00
7ac341c862 Fix EventBasics test
Summary: Add missing event reset.

Reviewed By: Yangqing

Differential Revision: D6236352

fbshipit-source-id: 4bee6dd22fa69532a9f376f03bb2f69c9c24c01e
2017-11-03 16:02:19 -07:00
ea9fcd5c47 fix copy-paste error in #3263 (#3476)
I have no idea how it worked on cuda 8, but apparently this fixes failures on cuda 9. cc @colesbury
2017-11-03 18:51:26 -04:00
f7a459b28b Fix overflow when using magma (#3470)
* Fix types

* Make types better instead of casting to size_t
2017-11-03 18:13:06 -04:00
20feef45bc NNFC operator: an FC with noTrans noTrans options
Summary:
This seems to be faster in a bunch of cases. Prefer to keep it as a
separate op instead of MatMul + Add so its easy to compare perf on per
op basis between this one and the baseline (normal FC)

Reviewed By: akyrola

Differential Revision: D6169187

fbshipit-source-id: 09b96325d44bd181896f396aec88b27314c435b0
2017-11-03 15:08:39 -07:00
68ed66a2c5 Faster BatchBoxCox Operator using MKL
Summary: Use MKL VML vsPow() and row-major iteration for faster BatchBoxCox operator.

Reviewed By: kennyhorror

Differential Revision: D6042052

fbshipit-source-id: 54fc6b9184cb341672183a77730d79a271d09207
2017-11-03 12:04:03 -07:00
13fde88b83 Install magma in cuda 9 docker (#3469) 2017-11-03 14:17:05 -04:00
b71cebb11f Fix LoadModel() in resnet50_trainer
Summary:
resnet50 trainer will save the 'optimizer_iteration' blob in checkpoints, but loads it i in GPU context. This fails because AtomicIter/Iter expect the blob to be in CPU context. So manually reset the optimizer_iteration in CPU context.

I am thinking of making the iter-operators automatically do this switch, but in the mean time this unbreaks the trainer.

Reviewed By: sf-wind

Differential Revision: D6232626

fbshipit-source-id: da7c183a87803e008f94c86b6574b879c3b76438
2017-11-03 11:15:25 -07:00
42de0df411 Add assertion that 'pos' is in-bounds (#3466) 2017-11-03 12:54:55 -04:00
a8efd88cac Fix warning in jit/ir.cpp 2017-11-03 09:11:33 -07:00
1b5c843a9c cleaner logic on sparse feature hashing
Reviewed By: kennyhorror

Differential Revision: D6195525

fbshipit-source-id: f687ac3d4914c3dbb0d35679e3a3d3a64a71ac53
2017-11-03 07:27:45 -07:00
1149b9bbb5 Polling async net executor
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
 Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
 ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device

Reviewed By: dzhulgakov

Differential Revision: D5985110

fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
2017-11-03 07:27:44 -07:00
8548dd2486 Fix intrinsic in perf kerneles for int8
Summary: 8 bytes is 64 bits. Fixes out of range access caught by ASAN

Reviewed By: Yangqing

Differential Revision: D6219576

fbshipit-source-id: f7c418b12fa211890abcb5aef800bd456390b73a
2017-11-03 05:19:58 -07:00
583bc63c98 Fix boundary checking in 8-bit sparselengthssum ops
Summary: Before the boundary checking was happening after the first access for 8bit ops.

Reviewed By: Yangqing

Differential Revision: D6206753

fbshipit-source-id: 07ab240cae8c67b3048f03aa79af0b6399b9940b
2017-11-03 05:19:57 -07:00
e11d2b9c9c Better error messages for Aten tensor types (#3449)
* Better error messages for Aten tensor types

* Address comments, add unit test
2017-11-03 07:59:05 -04:00
596a335851 Add gradient checks for take and put_ (#3460)
* Add gradient checks for take and put_

Fix the gradient formula for put_

* Make grad_output optional in gradgradcheck
2017-11-03 07:55:59 -04:00
9136dcdb60 Make grad_output optional in gradgradcheck (#3459) 2017-11-03 07:55:14 -04:00
cbedba373c use valgrind to make aten test pass 2017-11-02 20:39:11 -04:00
ebae2f6c71 MKL Sigmoid op wrapper
Reviewed By: Yangqing

Differential Revision: D6222910

fbshipit-source-id: 92d0825a6a35a4bf6a12636e3d5dd8affcffeef3
2017-11-02 17:30:29 -07:00
a7644e4f4b Extend rewrite functionality to handle multiple outputs.
Summary: Still assumes a complete subgraph, but slightly more generic.

Reviewed By: Yangqing

Differential Revision: D6103228

fbshipit-source-id: bfa0d46067e05baa0478a4c37a67ccf8f81f34ec
2017-11-02 17:30:27 -07:00
502aaf39cf make sure stdatomic.h is included when checking for ATOMIC_INT_LOCK_FREE 2017-11-02 19:53:36 -04:00
81e56ff8aa NO_CUDA for travis 2017-11-02 19:53:36 -04:00
531a20b312 enable ATen in the travis build tests. 2017-11-02 19:53:36 -04:00
f6dac327df build fixes 2017-11-02 19:53:36 -04:00
88d56cc198 fix setup.py paths 2017-11-02 19:53:36 -04:00
5aa5b572e4 update build so that all of TH* is in libATen 2017-11-02 19:53:36 -04:00
4424b3e352 Update CMakeLists.txt in TH* libraries to support static builds. 2017-11-02 19:53:36 -04:00
320ff3ad64 remove subtree of ATen since ATen is now inside pytorch 2017-11-02 19:53:36 -04:00
d792c21f72 move TH* folders into aten/src 2017-11-02 19:53:36 -04:00
39fc9f9c11 make stack only a function 2017-11-02 19:53:36 -04:00
e3b82a7665 use private to prevent double linking 2017-11-02 19:53:36 -04:00
5dfbc3d6c9 whole archive 2017-11-02 19:53:36 -04:00
f1b7464119 create file so that find_package works in CMake 2017-11-02 19:53:36 -04:00
185cd0af46 modify ATen/TH build to make/install only libATen.so libTH is built statically and folded into libATen 2017-11-02 19:53:36 -04:00
f3e4dc176b add TH_LINK_STYLE, which allows the universal use of STATIC libraries across TH* and ATen 2017-11-02 19:53:36 -04:00
9398e0c0c1 fix CMakeLists for new directories 2017-11-02 19:53:36 -04:00
8e584a5cd5 directory restructure 2017-11-02 19:53:36 -04:00
5c49df8875 from pytorch 2017-11-02 19:53:36 -04:00
1420375ead Change 'sizes' parameter name to 'size' in expand native function. 2017-11-02 19:53:36 -04:00
c38defd201 stack should not be a method. (#156) 2017-11-02 19:53:36 -04:00
360203a2a4 update nn 2017-11-02 19:53:36 -04:00
73117ea5ba Implement stack as a native function. 2017-11-02 19:53:36 -04:00
97bc100b92 Fix handling of inf and nan (#153) 2017-11-02 19:53:36 -04:00
f1c5d8c4ce Add an at() method for indexing. (#152)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 19:53:36 -04:00
07a049d900 update dlpack header and convertors 2017-11-02 19:53:36 -04:00
cbebcb347b Make valgrind optional to make our build pass 2017-11-02 19:53:36 -04:00
5d84013249 Correct dimensions for reduction functions, squeeze, unsqueeze.
Reduction functions that take a dimension now properly reduce
down to scalars if passed a 1-dimensional tensor.

Squeeze now properly reduces down to scalars as well (and is implemented
as a native function).

Unsqueeze now handles scalar inputs correctly (so unsqueezing a scalar
returns a dim 1 tensor, rather than a dim 2 tensor).
2017-11-02 19:53:36 -04:00
682aec30b5 Update ExpandUtils.h
Include what you use, otherwise compilation may break.
@prigoyal reported compilation errors with gcc-4.9 I believe.
2017-11-02 19:53:36 -04:00
cf348bcdee tighten hasCUDA check 2017-11-02 19:53:36 -04:00
ac1abc4cb8 Add comment explaining return of dim() when tensor is a scalar. 2017-11-02 19:53:36 -04:00
d5d6dafb04 Address review comments. 2017-11-02 19:53:36 -04:00
a10030eec7 Represent empty tensors as size {0} tensors and fix scalar checks.
This gets rid of kUndefinedDimensions and has nice properties like:
- the dimensionality always matches the length of the sizes and strides.
- the number of elements is always the product of the sizes (starting at the identity)
- the shape you pass to factory functions (e.g. randn) matches the shape that is returned
etc.

In addition to the empty tensor change, this makes some related changes:
1) expand is now a native function, because it needs to operate on the ATen view of the size/strides.
2) adds tests for a number of functions operating on empty, scalar, non-scalar tensors.
This uncovered a number of scalar_check bugs; some of these are fixed in the generated code,
some that need to be manually specified can be specified by a 'scalar_check' argument in the cwrap.
3) fixes the formatting of empty tensors
4) changes the THLongStorageView API; the public API was getting overly complicated, so now you call
'makeFromSize', 'makeFromStride', 'makeFromLength' and it just handles the correct mapping for that type.
2017-11-02 19:53:36 -04:00
c369d4da85 warning fix (#142) 2017-11-02 19:53:36 -04:00
0e9e18303b Adds permute and as_strided to ATen (#137)
Permute transposes multiple dimensions at once. The as_strided function
changes the sizes and strides of a tensor without changing the Storage.
It's a subset of Tensor::set_.
2017-11-02 19:53:36 -04:00
8cdd7650ee Make toScalarType and toBackend virtual
This allows VariableType override them to return instances of
VariableType. Combined with the change to Formatting.cpp, this lets us
print Variables to std::cout.
2017-11-02 19:53:36 -04:00
6b113b1d1c Make size, strides, dim functions const. 2017-11-02 19:53:36 -04:00
fee9195821 Change is_same_size to a native function.
For one thing, we will want a different implementation from TH because
we need to differentiate between scalars and 1-dim tensors.

Also, we don't really want to expose the THS/THCS function; in addition to
checking the shapes are the same, it checks that the dimensions which
are sparse are the same (because various THS/THCS operators only work if this
is true; it should really be called "is_congruent" or similar.
2017-11-02 19:53:36 -04:00
7273906eac Add unsqueeze of scalar to wrapdim_test. 2017-11-02 19:53:36 -04:00
1cde661df3 bind newWithTensor in ATen (#129) 2017-11-02 19:53:36 -04:00
dd0c95d552 fix merge problems 2017-11-02 19:53:36 -04:00
c11349a9b8 missing code from pytorch 2017-11-02 19:53:36 -04:00
a03621462e missing entry 2017-11-02 19:53:36 -04:00
bdc98a0e7a The at::cat should default to dim=0 2017-11-02 19:53:36 -04:00
60e7e96c7a update docs 2017-11-02 19:53:36 -04:00
32ecaa0870 regenerate docs w/ recent changes (#126) 2017-11-02 19:53:36 -04:00
8e13a95357 Support default parameters for native functions. 2017-11-02 19:53:36 -04:00
930b98cacd smarter backend option 2017-11-02 19:53:36 -04:00
b06d8937f5 sparse cuda and get device (#122) 2017-11-02 19:53:36 -04:00
0c1ce9feb2 Conda packaging (#119)
* conda packaging

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Update comment, and the build problem is fixed now
2017-11-02 19:53:36 -04:00
4afd630632 remove CMAKE_CXX_STANDARD stuff in favor of setting --std=c++11 directly because parse of FindCUDA ignore the former approach (#121) 2017-11-02 19:53:36 -04:00
c58913dc95 Remove C exports and rename AT_API 2017-11-02 19:53:36 -04:00
ed46386c85 Fix missing <functional> and export decorations in lib/ATen 2017-11-02 19:53:36 -04:00
d84429b526 Revert the enum changes as discussed 2017-11-02 19:53:36 -04:00
5683144b97 Fix typos in orgqr and orgmqr
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 19:53:36 -04:00
db1292a509 Add missing string include, fixes https://github.com/pytorch/pytorch/issues/3192
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 19:53:36 -04:00
f074a7a95c Rename value to other, wherever there is both a Scalar and Tensor overload. (#115)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 19:53:36 -04:00
67c2f0ead9 Separate out native processing into procecss_native; remove (TH)Type specific logic. 2017-11-02 19:53:36 -04:00
e73f6c211b Support 'native' ATen functions with Tensor, (base) Type, NS impls.
This adds the ability to specify 'native' functions in NativeFunctions.h and specifies
'split' and 'chunk' in this manner.  The function arguments, returns, variants, etc. are
specified as if they were processed via other parsing mechanisms (e.g. cwrap_parse) with
the following additional parameters:

type_method_definition_level: this allows one to specify that the type method should
be defined at the 'base' type level; this is because in the case of 'split' and 'chunk'
(and probably most/all other native functions that don't directly dispatch to TH/THC)
we don't need type-specific implementations.  Currently it is enforced that 'base' is
specified for native functions, but this is easy to remove later.

type_method_definition_dispatch: this defines the function to dispatch to.  For split,
this is at::native::split; this is just to avoid having a magic namespace and allowing
one to dispatch to a function with a different name.
2017-11-02 19:53:36 -04:00
b5d3edfd7f Update DLPack tensors enum to avoid binary issues and expose one function 2017-11-02 19:53:36 -04:00
bacca0eba1 Change softmax and log_softmax to take int64_t dim rather than int.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 19:53:36 -04:00
6849554ac6 Squash ATen warning
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 19:53:36 -04:00
8257f3e6e2 Update log_softmax and softmax signatures to include dim (#106) 2017-11-02 19:53:36 -04:00
164a8fbaf5 nit: move ATenDLMTensor to cpp file since it doesn't need to be in the header 2017-11-02 19:53:36 -04:00
3bc54bf2d9 [dlpack] Memory management for dlpack 2017-11-02 19:53:36 -04:00
30bbeb8b87 Relax Scalar::toXXX conversions to only check for overflow
Currently, the toXXX functions on Scalar check that the conversions are
exact. This will cause an exception in code like:

  auto t = CPU(kFloat).ones({1});
  t *= M_PI;

Or the equivalent in Python:

  t = torch.ones(1)
  t *= math.pi

This changes the checks to only throw an exception in the case of
overflow (positive or negative).
2017-11-02 19:53:36 -04:00
56a241f97a Every argument controlled by the output_mask may be null 2017-11-02 19:53:36 -04:00
e9be595081 Add additional erf, erfinv, and additional nn functions 2017-11-02 19:53:36 -04:00
2295a15b8c Add bindings to additional NN functions 2017-11-02 19:53:36 -04:00
8e312ab2e6 Expose is_nullable in Declarations.yaml
Some parameters can be null but do not have default values.
2017-11-02 19:53:36 -04:00
112f183dc2 Support broadcasting in copy
a.copy_(b) will now broadcast b to the shape of a. Note that this means
that copies between tensors of the same number of elements but
incompatible shapes are not allowed. For example, the following will
throw an exception:

  Tensor a = type.rand({4, 43);
  Tensor e = type.rand({3, 4});
  a.copy_(e)
2017-11-02 19:53:36 -04:00
c9889d934f Use pointer equality to compare types 2017-11-02 19:53:36 -04:00
a6335b54d6 Combine comparison methods and functions
The methods were separate because PyTorch supports multiple output types
for comparison methods. For example, for FloatTensors 'a' and 'b' both
calls are vaild:

   torch.lt(a, b, out=<ByteTensor>)
   torch.lt(a, b, out=<FloatTensor>)

ATen only supports ByteTensor outputs because the overloads have the
same static signature and would conflict. It would be nice to fix this
in the future like with the bernoulli function.

In the meantime, the separate function and method definitions with
different argument names make implementing VariableType more difficult.
2017-11-02 19:53:36 -04:00
54addcf0af Support wrap_dim in nn.yaml 2017-11-02 19:53:36 -04:00
dc9c5806a3 Expose the THGenerator* via unsafeGetTH on at::Generator 2017-11-02 19:53:36 -04:00
5cfa890926 pass values with flags 2017-11-02 19:53:36 -04:00
9f5c0a02a7 add a deleter callback to tensorFromBlob 2017-11-02 19:53:36 -04:00
004cd36efe Add additional comments 2017-11-02 19:53:36 -04:00
0243338603 Generate PyTorch-style NN bindings
This generates NN bindings with a similar interface to PyTorch's
torch.nn.functional package. The file nn.yaml specifies function
signatures and THNN implementations.

Each NN operation generates three functions. For example:

  - conv2d
  - conv2d_forward
  - conv2d_backward

The conv2d and conv2d_forward functions differ in how they handle
buffers that need to be passed to the backward function. conv2d_forward
takes the buffers as parameters. conv2d creates the buffers internally
and discards them.
2017-11-02 19:53:36 -04:00
c8c967fa43 Improve Declarations.yaml: (#81)
* Improve Declarations.yaml:

 - translate defaults to C++ values
 - include names of returned values
 - mark keyword-only arguments

* Add comment to translate_default
2017-11-02 19:53:36 -04:00
37d9ad748b Refactor out TensorBase from Tensor
Use TensorBase in Scalar class
2017-11-02 19:53:36 -04:00
25b97aebdf Fix copy and move constructors 2017-11-02 19:53:36 -04:00
43fbe58dc0 Remove has_full_argument_list 2017-11-02 19:53:36 -04:00
986c577e93 Fix lint 2017-11-02 19:53:36 -04:00
9b0b26d037 Add check that tensor is defined in Scalar constructor 2017-11-02 19:53:36 -04:00
937950e064 Move default arguments to function declaration
* Make alpha, beta in addmm kwarg_only
 * Move kwarg_only arguments to the end
 * _out variants now have output arguments at the beginning
2017-11-02 19:53:36 -04:00
3d80bd31d8 Fix build for MSVC 2017-11-02 19:53:36 -04:00
32057edbf3 Fix build (#75) 2017-11-02 19:53:36 -04:00
9a6334fead Implement _unnarrow (backwards of narrow) in ATen.
Note this is currently prefixed with an underscore because it may go away
(can be implemented via index).
2017-11-02 19:53:36 -04:00
f3e2d6669e Enable wrap_dim in Local.cwrap.
This includes torch.cat, which is a TensorList argument, which wasn't supported before.
2017-11-02 19:53:36 -04:00
211c717e53 Make all dim arguments int64_t 2017-11-02 19:53:36 -04:00
7d1c01a86f Converting dlpack tensor to aten tensor 2017-11-02 19:53:36 -04:00
6826a5c467 adding a simple class for converting atensor to dlTensor 2017-11-02 19:53:36 -04:00
6b61d72eec Test stub for dlconvertor 2017-11-02 19:53:36 -04:00
21d98db9b8 adding dlpack header 2017-11-02 19:53:36 -04:00
99141e62a6 Fix build failure in MSVC 2017-11-02 19:53:36 -04:00
0acaf1ee6b Update generated docs for post-const Type changes. 2017-11-02 19:53:36 -04:00
dec470797b Mark all (non-static) Type methods as const. 2017-11-02 19:53:36 -04:00
73a31cfed2 add merge_all script for subtrees 2017-11-02 19:53:36 -04:00
9ed7ab82de Win64 support for lib/ATen 2017-11-02 19:53:36 -04:00
aba1bb1d46 Micro optimizations in ATen
* Compare typeid instead of using dynamic_cast
* Mark derived TensorImpl classes as final
* Use tensor->nDimension instead of THTensor_(nDimension)
2017-11-02 19:53:36 -04:00
9a01d3f374 add support for custom python 2017-11-02 19:53:36 -04:00
19770db681 Make 's_' functions on Type public 2017-11-02 19:53:36 -04:00
33e94adaa9 Mark unsafeGetTH as const 2017-11-02 19:53:36 -04:00
ec539abc6e Move wrap_dim code to Utils function to minimize generated code. 2017-11-02 19:53:36 -04:00
054a9719f1 Generate wrap_dim code on derived type rather than base type.
Either should work, but code feels more natural this way.
2017-11-02 19:53:36 -04:00
e33d154bcc Support wrap_dim specifications from cwrap. 2017-11-02 19:53:36 -04:00
21df48f7b4 Use cast instead of literal as a temporary fix 2017-11-02 19:53:36 -04:00
709dfba95a Fix default constructor argument 2017-11-02 19:53:36 -04:00
2d5764539f force NO_CUDA to be specified to disable cuda. add pytorch's FindCUDA so that it is possible to get ccache to work for nvcc. make excluded notification more concise. 2017-11-02 19:53:36 -04:00
752ebc58cc Handle scalars that are not backed by tensors 2017-11-02 19:53:36 -04:00
d23a83add4 Add accessor to underlying Tensor 2017-11-02 19:53:36 -04:00
2a2c989e4b zero_dim_to_one and empty_to_null can't both be specified 2017-11-02 19:53:36 -04:00
af184b562b Rename 'canonical' to 'has_full_argument_list' 2017-11-02 19:53:36 -04:00
463fb29710 Include non-canonical functions in Declarations.yaml 2017-11-02 19:53:36 -04:00
efbc1ad2a8 Make Scalar default constructible 2017-11-02 19:53:36 -04:00
bfeacce4ff fix static linkage and make THD statically linked 2017-11-02 19:53:36 -04:00
bfc85dbe0f Handle default arguments in base Type class 2017-11-02 19:53:36 -04:00
3e960b759f Use CWRAP_FILES_BASE if defined 2017-11-02 19:53:36 -04:00
b4900260ef Add missing const qualifiers 2017-11-02 19:53:36 -04:00
adc9cf15ed Fix typo. 2017-11-02 19:53:36 -04:00
e5f6057f86 Remove unnecessary early conversion to IntList and make expand functions inline. 2017-11-02 19:53:36 -04:00
f2168578f0 Remove scalar expansion tests. 2017-11-02 19:53:36 -04:00
3ca164a6cc Address review comments. 2017-11-02 19:53:36 -04:00
8b049e1c46 Support broadcast specifications from cwrap.
This respects all the broadcast cwrap specifications except for 'fallback';
i.e. pointwise functions operating on tensors where the number of elements
match but the sizes are different and not broadcastable.  This behavior is
currently deprecated in PyTorch.  Note that this is a breaking change in ATen,
because ATen just passes through to TH/THC, where the fallback behavior is
actually implemented.

This also changes expand semantics wrt Scalars (as tensors).  Previously,
one could 'expand' a 1-dimensional tensor with size 1 to a 'scalar' (i.e.
empty size initializer list).
2017-11-02 19:53:36 -04:00
1f0461e76c elementSizeInBytes for types 2017-11-02 19:53:36 -04:00
e84634e4d6 provide more information in Declarations.cwrap 2017-11-02 19:53:36 -04:00
c23dfe5ddb update generated code in documentation to match changes 2017-11-02 19:53:36 -04:00
2d018cc24e sync Declarations.cwrap with pytorch 2017-11-02 19:53:36 -04:00
11807f99b4 Add rudimentary support for calling a few sparse tensor functions. 2017-11-02 19:53:36 -04:00
a1438bad5f fix issues where scale gets reported as 0.0000 in output 2017-11-02 19:53:36 -04:00
8fd8cf7b24 Small readme fix 2017-11-02 19:53:36 -04:00
b57f82a2cb made the repository available for embedding into other projects 2017-11-02 19:53:36 -04:00
3fc3289745 add some asserts to basic.cpp 2017-11-02 19:53:36 -04:00
fefc2a2c9b add valgrind to CI 2017-11-02 19:53:36 -04:00
187c4ffdd9 allow retain to be specified for unsafeTensorFromTH 2017-11-02 19:53:36 -04:00
99b94fe73f fix osx build errors related to long/int64_t 2017-11-02 19:53:36 -04:00
8c427b7715 Note [Undefined-dim versus 0-dim]
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 19:53:36 -04:00
c0ad9380c0 fix a bug where some scalars were getting truncated to integers incorrectly. 2017-11-02 19:53:36 -04:00
e3322069ec Fix build for CPU only machines 2017-11-02 19:53:36 -04:00
78820919a5 return a sentinel value when THTensor has undefined dimensions. 2017-11-02 19:53:36 -04:00
8380a1a110 fix lint 2017-11-02 19:53:36 -04:00
6fe5126a0a Static linking against libstdc++ in Binary Build mode 2017-11-02 19:53:36 -04:00
7a11627a13 Make clang shut up about class/struct mismatch.
Makes us -Werror clean again, I think.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 19:53:36 -04:00
7af130f6a4 still generate multiple versions 2017-11-02 19:53:36 -04:00
b7f793c618 add support for Null Tensors to functions 2017-11-02 19:53:36 -04:00
070b2ce33c lint fixes 2017-11-02 19:53:36 -04:00
96b4cfdba0 produce a Declarations.yaml file that describes Functions/Type/Tensor methods that framework produced. 2017-11-02 19:53:36 -04:00
01b8f07624 basic travis script (build + pylint) 2017-11-02 19:53:36 -04:00
7b9b538aa5 operator== for type 2017-11-02 19:53:36 -04:00
95e4c4ff87 allow type inference to work on TensorList 2017-11-02 19:53:36 -04:00
f6ac105d9a Fix handling of if_true/if_false in ATen 2017-11-02 19:53:36 -04:00
f4149b7d0e Half fixes for ATen and CUDA 9.0 2017-11-02 19:53:36 -04:00
e4e960a42b lint fixes 2017-11-02 19:53:36 -04:00
782be20ac2 fix bug in method declarations 2017-11-02 19:53:36 -04:00
3281be7e3c add isCUDA() on Type 2017-11-02 19:53:36 -04:00
1da1a50ee8 write generated_cpp. to a file rather than as output to make error reporting clearer. 2017-11-02 19:53:36 -04:00
5136bf2dee dont clobber gen.py error, fix for old versions of python 2017-11-02 19:53:36 -04:00
38681b47e7 fix lint 2017-11-02 19:53:36 -04:00
afdc44c73e match PyTorch syntax 2017-11-02 19:53:36 -04:00
2fff1dd056 checked cast does it all 2017-11-02 19:53:36 -04:00
2cd8bbd2bc basic cat implementation in ATen 2017-11-02 19:53:36 -04:00
52a561e583 Fix ATen build for debug python 2017-11-02 19:53:36 -04:00
052ad8bf04 Fix a few C++ warnings
1) Type needs a virtual dtor
2) Tensor move ctor should be noexcept
3) Make constructors from Context* and Type* explicit
2017-11-02 19:53:36 -04:00
a19b9d0a1c add some documentation to Tensor 2017-11-02 19:53:36 -04:00
e73ab1c4c4 add basic gitignore, thpp -> at doc fix 2017-11-02 19:53:36 -04:00
67dba8144d always use a custom default float 2017-11-02 19:53:36 -04:00
0c224445d1 python style fixes 2017-11-02 19:53:36 -04:00
f5a57e0f7e support unsafe functions for getting/constructor tensors from TH objects for backward compat. 2017-11-02 19:53:36 -04:00
e7a64b8e95 lazily initialize cuda so that we behave similar to PyTorch 2017-11-02 19:53:36 -04:00
70b1401be5 osx build issues and clang warnings 2017-11-02 19:53:36 -04:00
fc429af20a remove Sparse from dispatch for now, will add dispatch variants later 2017-11-02 19:53:36 -04:00
dbba384bd6 Always include THNN in the build, don't check for CUDA twice
As a result, the project builds on MacOS with gcc-6 (without CUDA).
2017-11-02 19:53:36 -04:00
b8152eba8d fix build issue when cuda does not exist 2017-11-02 19:53:36 -04:00
23bda6a36e bind THS THCS, leaving all operators unimplemented. This is required because THPP can represent Sparse tensors even though the wrapper doesn't implement any operators. 2017-11-02 19:53:36 -04:00
eed5e2d143 adding build for sparse libraries 2017-11-02 19:53:36 -04:00
d2345ff1af enable warnings in build and fix warnings 2017-11-02 19:53:36 -04:00
650f24b569 update readme and add assign_(Scalar) variant 2017-11-02 19:53:36 -04:00
328a250b64 fix a bug with scalar handling by simplifiying the maybeScalar check. 2017-11-02 19:53:36 -04:00
ec4cb72a0d handle select and operator[] style operations 2017-11-02 19:53:36 -04:00
622350d3e9 add checks for scalars on output 2017-11-02 19:53:36 -04:00
d460f6725d start adding rules to propagate scalar to results 2017-11-02 19:53:36 -04:00
dadc23cafb Scalar objects can now be backed by 0-dim Tensors. 2017-11-02 19:53:36 -04:00
4b2ea3ff2f missing fixed allocator files 2017-11-02 19:53:36 -04:00
6100191fff scalar flags added, and used to dispatch when there is a scalar variant of a function. broadcast annotations are used to figure out when a scalar s + A should also be converted. 2017-11-02 19:53:36 -04:00
f62def0701 set TH_INDEX_BASE to 0 2017-11-02 19:53:36 -04:00
e7a316e1ee update with tensorFromBlob doc 2017-11-02 19:53:36 -04:00
70e3951eca allow tensors to be constucted from views of external data. Support creating new tensors that already have a size/stride 2017-11-02 19:53:36 -04:00
6b285cb37d improve error reporting for undefined tensors passed as arguments. 2017-11-02 19:53:36 -04:00
7f376a2c46 tensor.data<> also as toLongData() variants. Scalar now also has .to<T>() variants 2017-11-02 19:53:36 -04:00
e7436022f4 document accessors 2017-11-02 19:53:36 -04:00
e32210658d add readme and generated files for Type/Tensor/Functions to a doc folder to make it possible to view headers without building the library 2017-11-02 19:53:36 -04:00
7a5987123f rename TensorLib -> ATen 2017-11-02 19:53:36 -04:00
2c2648ea38 split Local.cwrap from Declarations.cwrap so local ones can be modified without regenerating declarations from pytorch 2017-11-02 19:53:36 -04:00
4e3b1c46d9 adding xt makefile 2017-11-02 19:53:36 -04:00
37f5e3ff78 import xt data/meter directories 2017-11-02 19:53:36 -04:00
f6f6fa2464 add operator [] to do select 2017-11-02 19:53:36 -04:00
56f1019fc7 add overloaded operators for tensor object 2017-11-02 19:53:36 -04:00
288fd61c0b add accessor object for fast(er) access to tensor data when the dim and scalar type are known. 2017-11-02 19:53:36 -04:00
927ac2bb1a add script that can collect all the cwrap declarations for external use 2017-11-02 19:53:36 -04:00
3976333bc6 fix build paths and allow for cwrap_files to be externally specified 2017-11-02 19:53:36 -04:00
9879566a3b switch dispatch to function 2017-11-02 19:53:36 -04:00
92c2aad894 more flake8 2017-11-02 19:53:36 -04:00
0ba09a843a disable tests from cmake for tensorlib 2017-11-02 19:53:36 -04:00
5d330de56e autopep8 2017-11-02 19:53:36 -04:00
a83f62e36f remove PUBLIC from target_link_libraries in CMake 2017-11-02 19:53:36 -04:00
e65bef39df fix handling of methods that allocate returns 2017-11-02 19:53:36 -04:00
8f9c222fc5 make copy copy_out and add copy_ to be consistency with argument/output order for the rest of the library 2017-11-02 19:53:36 -04:00
eb9e6165be fix error messages 2017-11-02 19:53:36 -04:00
76a2d7bff8 port optional argument declaration handling to shared code 2017-11-02 19:53:36 -04:00
8454b87034 reuse declaration option sorter from common_with_cwrap in ArgcountSortPlugin 2017-11-02 19:53:36 -04:00
595ff0d3ed move set_declaration_defaults to a common location 2017-11-02 19:53:36 -04:00
e88ae5eb49 port xt basic.cpp 2017-11-02 19:53:36 -04:00
7897aac109 get rid of lt_t variants for now. These will not be exposed in the C++ library yet. 2017-11-02 19:53:36 -04:00
424d5d1faf fix generator bug, begin porting tests 2017-11-02 19:53:36 -04:00
d69f2e4ff9 import xt print code, implementing copy and type conversion 2017-11-02 19:53:36 -04:00
eb20c8daa2 initial binding of TH(CU)NN 2017-11-02 19:53:36 -04:00
16c3b7e3f4 return references when the returns are actually just one of the arguments. 2017-11-02 19:53:36 -04:00
ea77e3ddef auto-generate const mark on tensor based on in-place 2017-11-02 19:53:36 -04:00
285a820877 remove TensorRef. Instead correctly mark const Tensor & and Tensor & in arguments depending on use. 2017-11-02 19:53:36 -04:00
53eafff042 addressing comments from pull request: processors codemodded to backend and other minor changes 2017-11-02 19:53:36 -04:00
6a3e5510dc fix context initialization to use https://stackoverflow.com/questions/12302057/c11-safe-double-checked-locking-for-lazy-initialization-possible/12302355#12302355 2017-11-02 19:53:36 -04:00
415449470e autopep8 2017-11-02 19:53:36 -04:00
cbb798fbca add generator for out-of-library dispatch macro 2017-11-02 19:53:36 -04:00
e178b5c9c6 fix duplicate symbol issue 2017-11-02 19:53:36 -04:00
514b31c5e5 support multiple returns 2017-11-02 19:53:36 -04:00
095cd734fd resize and zero handling 2017-11-02 19:53:36 -04:00
79bb2d842a fix a few before_call cases, and annotate the resize info in cwrap 2017-11-02 19:53:36 -04:00
ab7517d888 changes to make cuda parts of wrapper compile. 2017-11-02 19:53:36 -04:00
5d1fd0cab1 add TensorRef so that we don't refcount++ on argument passing 2017-11-02 19:53:36 -04:00
07aacec83d example things 2017-11-02 19:53:36 -04:00
29c0dadfaa implement size() stride() and formatting for IntList 2017-11-02 19:53:36 -04:00
b3b61d6596 long -> int64 to avoid hack in Scalar 2017-11-02 19:53:36 -04:00
18095c713d inline the static methods/functions so they can be optimized 2017-11-02 19:53:36 -04:00
7e98cabf25 switch Tensor to ref counting, using pImpl pattern 2017-11-02 19:53:36 -04:00
17b88322b0 make type statically dispatched 2017-11-02 19:53:36 -04:00
1beb1732bb switch Storage/Generator to be returned as unique_ptr 2017-11-02 19:53:36 -04:00
be6ec51140 mod to use references rather than pointers to make API look correct 2017-11-02 19:53:36 -04:00
4496398eee add a default type to make the library more ergonomic 2017-11-02 19:53:36 -04:00
af791fbc83 handle strings as bools, now compiles for CPU classes 2017-11-02 19:53:36 -04:00
d38b4e97c2 more progress getting it to compile, now makes it through a few CPU types and fails on Double 2017-11-02 19:53:36 -04:00
1420566199 fix some ambiguiuty problems 2017-11-02 19:53:36 -04:00
1ce4a51885 checked casting for scalars 2017-11-02 19:53:36 -04:00
e48a14fecc logic fix for result allocate things 2017-11-02 19:53:36 -04:00
d06ffcc5ca array ref and storage views for THSize/THStride 2017-11-02 19:53:36 -04:00
d1ef531b09 to env type 2017-11-02 19:53:36 -04:00
cdb06e17a4 more fixes to handle a lot of cwrap 2017-11-02 19:53:36 -04:00
8df54be9d7 some changing before generalizing to more types 2017-11-02 19:53:36 -04:00
4738577036 integrate checked_cast 2017-11-02 19:53:36 -04:00
4936ce0c2f generating code for neg 2017-11-02 19:53:36 -04:00
fe1c286f48 add cast that may or may not work 2017-11-02 19:53:36 -04:00
4f1e04b615 fix utils (oops), also add prints 2017-11-02 19:53:36 -04:00
bf07aec920 listen to variants 2017-11-02 19:53:36 -04:00
ff89a39d41 add assert function 2017-11-02 19:53:36 -04:00
34ee792c11 more scaffolding for emitting derived functions 2017-11-02 19:53:36 -04:00
f94a145bfa more scaffolding to generate. still need to generated derived 2017-11-02 19:53:36 -04:00
9f0ce2666e add stuff to process each option in the right place 2017-11-02 19:53:36 -04:00
33dab3c593 fix cuda build 2017-11-02 19:53:36 -04:00
acbd569710 add places in templates where we will put generated methods 2017-11-02 19:53:36 -04:00
ae0a749258 add flags to be able to build without CUDA 2017-11-02 19:53:36 -04:00
486a606d0d rename types and processors to match naming in gen.py, allow for [[CPU,floating_point], [GPU,all]] style pair listings so that we can simplify the logic for elaborating pairs 2017-11-02 19:53:36 -04:00
92aee309fd process types and processors 2017-11-02 19:53:36 -04:00
bf83194db7 sanitize names 2017-11-02 19:53:36 -04:00
8e29bb52e4 add sort... 2017-11-02 19:53:36 -04:00
365dfee37d Option elaboration 2017-11-02 19:53:36 -04:00
842f94b320 initial sanitize 2017-11-02 19:53:36 -04:00
20441712d1 add declarations 2017-11-02 19:53:36 -04:00
ef37e9d9ad infra to load yaml from cwrap 2017-11-02 19:53:36 -04:00
d9783e2293 fix header files to be in TensorLib 2017-11-02 19:53:36 -04:00
0902ec3df3 put a fake example op in to understand how dispatch will propagate. 2017-11-02 19:53:36 -04:00
8729715051 add tensor skeleton 2017-11-02 19:53:36 -04:00
cb0366c6ca adding Type object which will handle dispatch 2017-11-02 19:53:36 -04:00
ca3dd74c55 generate cmake outputs using script 2017-11-02 19:53:36 -04:00
0e0c0ef89e add storage to generator 2017-11-02 19:53:36 -04:00
c64b031fbf Initial commit of framework for TensorLib 2017-11-02 19:53:36 -04:00
3003ebe67a Replace None grad_inputs with zero tensors in some cases (#3433)
Replace None grad_inputs with zero tensors in some cases

In Python-implemented autograd functions, we sometimes return None as
the grad_input if the output is marked "non-differentiable". This
replaces those None values with zero-filled Variables if the
corresponding input has requires_grad=True.

C++ implemented autograd functions expect the input (grad_outputs) to
be defined if they're executed. They always return non-null grad_inputs
if should_compute_output(i) is true. This could lead to segfaults if a
subsequent Python-implemented function returned None.

See #3412, #3241
2017-11-02 17:23:25 -04:00
b07a9e1219 Fix dropout state restoring
Summary:
\cc akyrola

Fixes a few issues:
1. Performance issue related to regeneration of rng states every time the input size changed - this was unnecessary, now states should be initialized once only.
2. States were being overwritten between fprop and bprop operators, causing silent wrong results. This required use of the new `cudnnRestoreDropoutDescriptor` API, requiring a new gating behind cuDNN v7
3. Random seed was not being inherited from the `operator_def.device_option()`
Closes https://github.com/caffe2/caffe2/pull/1418

Differential Revision: D6222081

Pulled By: akyrola

fbshipit-source-id: 021067b95bcf0a16db8f4a73d3ed70e21b54bc9f
2017-11-02 14:17:41 -07:00
b1ea066836 Remove duplicate Docker dependency
Summary: Closes https://github.com/caffe2/caffe2/pull/1396

Differential Revision: D6224487

Pulled By: Maratyszcza

fbshipit-source-id: 79b5641e9d8a7e5bc487f76ea931cf431e341707
2017-11-02 14:01:27 -07:00
8b1b06d723 add CUDA_DEBUG build flag (#3419) 2017-11-02 15:35:18 -04:00
48fe5d4622 Move select and permute to ATen/C++ (#3421)
Move select and permute to ATen/C++
2017-11-02 15:17:36 -04:00
066db5dea3 Don't rely on squeeze_out in THD. (#3446)
We don't currently generate _out functions for ATen native functions and may not
(they don't work with Variables currently).  Also, the existing code was wrong
as the argument orders were swapped in the two squeeze variants.
2017-11-02 15:12:39 -04:00
dfaccc96b7 add dockerfile with cuda9 volta support (#3445) 2017-11-02 15:08:39 -04:00
d0accb85e0 Send/Recv C++ portion
Summary:
Implements send/receive calls in C++. This includes both a C2 independent
library in async/comm as well as the C2 operations in the c2 sub-directory

There are still several items to be addressed in future diffs:
  - multiple channels per pair to alleviate the issue with small message latency
  - re-add statistics per comm-client and per-op
  - continue adding test cases as usage patterns diversify

Reviewed By: akyrola

Differential Revision: D6095219

fbshipit-source-id: 6d72770dbac693d2b7035f03ce8c6df5ce03706e
2017-11-02 11:25:50 -07:00
8d377617e7 Fix MKLMemory::CopyTo for case where shapes don't match'
Summary:
There were cases where the direct copy succeeded, but the
dimensions didn't match. Now, we check dimensions and reset if they
don't match before issuing the copy.

Reviewed By: salexspb

Differential Revision: D6103325

fbshipit-source-id: 602605d8b119cae74e006c792bc42f355a5a9b4e
2017-11-02 11:25:49 -07:00
7244d27220 Add a EmptyDeviceScope (i.e. allow setting CurrentDeviceScope() to None)
Summary:
See comments for where this can be useful (disabling the
OperatorDef::DeviceOption(...) so we can control the scope at the
NetDef::DeviceOption(...) level).

Reviewed By: viswanathgs

Differential Revision: D6103412

fbshipit-source-id: 75a9be54275760132f6d1e71acbe9190e7099289
2017-11-02 11:25:48 -07:00
d96a5ddb1b Load/Save/Reshape in MKL via fallback
Summary: TSIA

Reviewed By: viswanathgs

Differential Revision: D6103278

fbshipit-source-id: dc3d2754bed5bf54f2ab8f0a9a9cc0d5d15502af
2017-11-02 11:25:47 -07:00
1fb68fd371 Fix FC op invariant
Summary: TSIA

Reviewed By: viswanathgs, pietern

Differential Revision: D6103249

fbshipit-source-id: c4563dd6900f23a0ee1d1bf1386238f19b4ba7bd
2017-11-02 11:25:46 -07:00
14f95c2782 Updated brew SpatialBN to use initializers
Summary: Updated brew SpatialBN to use initializers similar to other brew ops such as conv and fc instead of initilaizing all of its parameters itself within the brew call.

Reviewed By: asaadaldien

Differential Revision: D5840359

fbshipit-source-id: 9f3d688d4957605eaf7ecd2488bc26bfb1da3f78
2017-11-02 11:25:45 -07:00
e4af5e4e04 Update the sample rate function call since the API is changed
Summary:
With the update of the sample rate API, caffe2_benchmark needs to be changed as well.

Tested building the caffe2_benchmark and running the program on an android phone. See the delay metrics reported in adb.
Closes https://github.com/caffe2/caffe2/pull/1419

Reviewed By: Maratyszcza

Differential Revision: D6221101

Pulled By: sf-wind

fbshipit-source-id: 77a06ecce55b54cff8b9fa0aef857bc542a5f371
2017-11-02 10:32:12 -07:00
afdf50cafe Move jit/assert.h to csrc/assertions.h (#3442)
I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.
2017-11-02 13:26:51 -04:00
bed30c1582 long* -> int64_t* 2017-11-02 13:13:10 -04:00
9b2117ed87 Fix MSELoss docs (#3443) 2017-11-02 13:08:36 -04:00
fc7a68d147 fix lint 2017-11-02 07:36:58 -04:00
4108feb27d fix OSX cuda build 2017-11-02 07:15:24 -04:00
2ed64d13db Generate globally unique workspace id for hogwild threads on all trainers
Summary: As title

Reviewed By: azzolini

Differential Revision: D6150329

fbshipit-source-id: 5102fb7605b889a54d6654017f452cad6be78ef3
2017-11-01 23:37:34 -07:00
9ca8b321f5 Skip cpp tests if CUDA not available.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-02 02:21:10 -04:00
5388948b59 CreateLocalBlob for workspace
Summary: Adds the ability to create a local blob in the workspace even if the blob exists in the parent workspace. This is to support cases where a user wants to create a local copy of the blob and hide the blob from the parent workspace.

Reviewed By: akyrola

Differential Revision: D6194386

fbshipit-source-id: 92c064159ac635ee76c211abc013b72bd8752447
2017-11-01 21:32:47 -07:00
ef48ab0bb3 Update observer sample rate
Summary:
We'd like to sparsely sample the net execution, but after the net is sampled for the first time, we'd like to densely sample the following few iterations so that we can have some meanful data for a short period of time.

Change the observer sample rate to the following:
skipIter: skip the first few iterations.
netInitSampleRate: the sample rate for the first iteration after the skipIter or immediately after reset.
netFollowupSampleRate: the sample rate after the netInitSampleRate is hit.
netFollowupSampleRate: the number of iterations that use the netFollowupSampleRate. After this number is hit, use netInitSampleRate (reset)
operatorNetSampleRatio: whenever the net is sampled, if the random number also hit operatorNetSampleRatio, collect operator metrics instead.

Reviewed By: Maratyszcza

Differential Revision: D6205657

fbshipit-source-id: da0c048f77fc4dc64f3fb71b6072429a57e9d2f0
2017-11-01 18:59:42 -07:00
7c2804ee90 Add support for doing broadcast with single elem dimensions at both ends
Summary: Closes https://github.com/caffe2/caffe2/pull/1413

Reviewed By: jamesr66a

Differential Revision: D6201556

Pulled By: bddppq

fbshipit-source-id: 1d443e895dbb3f5b67a5a0e027977b7807df3de1
2017-11-01 18:33:11 -07:00
2c10b13eeb Pass CUDA_NVCC_EXECUTABLE to NCCL build
Summary:
If this variable is set to a ccache symlink then the NCCL build will
also use the cache. The NCCL build is the slowest component of a cached
build without this change
Closes https://github.com/caffe2/caffe2/pull/1416

Reviewed By: Yangqing

Differential Revision: D6214008

Pulled By: pietern

fbshipit-source-id: e0a90e27de9b1c5a1fdc0e5bad5fb61f9fa924c3
2017-11-01 15:32:22 -07:00
53b01527f4 Improve NYI error message to point to VariableType 2017-11-01 23:18:17 +01:00
6e17e73701 Register VariableType methods in autograd profiler 2017-11-01 23:18:17 +01:00
72a5bb3c09 Remove possible static initialization order fiasco
Summary: CAFFE2_ENFORCE accesses a global variable in a separate compilation unit.

Reviewed By: romain-intel

Differential Revision: D6200236

fbshipit-source-id: a501b05bd23afec2ef4a23dd482a4dc4cfc196f1
2017-11-01 14:28:35 -07:00
b5c053b1c4 fix fp16 issues with resnet trainer
Summary:
My commit  bab5bc  broke things wiht fp16 compute, as i had tested it only with the null-input, that actually produced fp32 data (even dtype was given as float16). Also, I had confused the concepts of "float16 compute" and fp16 data. Issue #1408.

This fixes those issues, tested with both Volta and M40 GPUs. Basically restored much of the previous code and fixed the null input to do FloatToHalf.

Reviewed By: pietern

Differential Revision: D6211849

fbshipit-source-id: 5b41cffdd605f61a438a4c34c56972ede9eee28e
2017-11-01 13:30:08 -07:00
66d24c5067 Update the ONNX doc 2017-11-01 15:43:08 -04:00
0e38d3bbb3 remove thpp library (#3405) 2017-11-01 11:57:09 -04:00
df0bf06385 move type enum into THD (#3403) 2017-11-01 10:41:55 -04:00
b544882335 ATen in THD (Part I) (#2288)
* enable size from ATen type

* temp commit aten thd

* port copy, math

* port random

* changes after rebase

* lapack bind

* thd and csrc compile

* fix min/max reductions in DataChannelTCP

* clean up changes

* re-enable tensor constructors

* port MPI to at::Tensor

* fix storage methods to not cast to thpp storage ptrs
2017-11-01 09:59:02 -04:00
b7f5bc506e Make inputs/outputs return an ArrayRef.
Some knock on effects:

- at() is not supported on ArrayRef.  I fixed this by adding a new
  overload for input() to access a specific input.  I also filed
  https://github.com/zdevito/ATen/pull/152

- Need new overloads for fmap/filter, because template deduction won't
  attempt an implicit constructor in attempt to match the argument.

- New overload in ir.cpp for printing ArrayRef.

- When we pybind11 an ArrayRef, we convert it into an iterator.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 09:49:53 -04:00
d4abaa4b9e Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing.
This breaks a lot of the onnx-pytorch tests because the abstraction
barriers are not respected.  I'll spin up a patch for that separately.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 09:49:53 -04:00
247d50e2ad Improve const-correctness of JIT.
This started off as a minor fix based on Adam's question, "why is printing
a graph not const" and snowballed into a giant yak shaving exercise.

- The Graph and Node APIs now uniformly enforce deep constness; e.g., if you
  get a const Node* or const Graph*, it is not possible to get a non-const
  Node*/Graph* somewhere else in the graph (even though the member variables
  of these are non-const.  Hooray for private access specifier.)

- A big pile of functions got const versions, most notably the printing
  functions, and functions for accessing inputs().

- REALLY IMPORTANT, BC-BREAKING CHANGE: inputs() now returns a COPY of the
  inputs, rather than a reference to the underlying.  I was forced to do this
  because there is no way to portably turn a std::vector<Node*> into a
  std::vector<const Node*>, which is necessary to provide a const-correct
  version of inputs() that enforces deep const-correctness.  I then justified
  this choice to myself with the observation that outputs() returned a
  copy (by necessity), so this makes the API more uniform.

  But making this change uncovered two very subtle bugs:

    1. If you change functions from returning a reference to returning a copy,
       the idiom node->inputs().begin() is no longer valid, because the memory
       the iterator points to immediately becomes invalid.  THIS SUCKS.
       Honestly, we should add a lint rule rejecting calling begin()/end() on
       temporaries because this is very dangerous.  To excise this pattern from
       the codebase, I added begin() and end() methods to Graph, so that we got
       rid of the graph->nodes().begin() idiom, which happens to be sound,
       despite not returning a reference, because graph_node_list is a
       non-owning reference.

    2. pybind11 doesn't handle std::vector<Node*> cast out of the box.
       Fortunately, I found a simple fix in the GitHub issues tracker
       that involved adding an extra type converter.  And yes, this
       does mean that outputs() in Python never worked correctly.

- New const_graph_node_list, which is a graph_node_list that gives you const
  Node*

There are some more miscellaneous improvements:

- Applied CR comment fixes on export.cpp; using replaceInput, and renaming
  variables for clarity.

- assertValidInput helper method added, and applied to replaceInput

- Use an explicit function to print THPObjectPtr, otherwise we get
  the wrong overload.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 09:49:53 -04:00
b043a74919 fix softmax doc (#3337) 2017-11-01 08:47:51 -04:00
638f0b5d78 Prevent numerical issues with poisson_nll_loss when log_input=False (#3336)
* Prevent numerical issues with poisson_nll_loss when log_input=False

Evaluation of the logarithm of the input variable in poisson negative log likelihood leads to NaN loss if variable being evaluated is zero. Small epsilon is added to prevent this. See equivalent Keras epsilon here: https://github.com/fchollet/keras/blob/master/keras/losses.py#L68

* PEP8 fix

* Add epsilon support to PoissonNLLLoss in nn.modules.loss
2017-11-01 08:47:19 -04:00
91af122d43 add no-as-needed for THRTC 2017-11-01 04:25:42 -07:00
ae48a394b7 Count hits/misses, add statistics printing. (#3369)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-01 06:35:48 -04:00
6214487fa7 Add reduce keyword to L1Loss (#3366)
* Add reduce keyword to L1Loss

* Fix legacy test for abscriterion

* Address comments
2017-11-01 06:33:18 -04:00
bf4c269bee Implement reduce keyword for SmoothL1Loss (#3382)
* Implement reduce keyword for SmoothL1Loss
2017-11-01 06:29:34 -04:00
3cb34744db adaptive pooling supports only specifying size in certain dimension (#3127)
* adaptive pooling supports only specifying size in certain dimension
2017-11-01 06:11:30 -04:00
d77b94495d Pass -DOMPI_SKIP_MPICXX=1 when building C code (#3378) 2017-11-01 06:09:03 -04:00
88d9ebc850 lazy-load nvrtc and libcuda (#3408) 2017-11-01 06:07:03 -04:00
fa5efab669 comments and case where not all sparse (#3370) 2017-11-01 06:05:17 -04:00
7c0b16c140 Add torch.take and Tensor.put_ (#3263)
* Add torch.take and Tensor.put_

These are similar to numpy.take and numpy.put. The take function allows
you to linearly index into a tensor without viewing it as a 1D tensor
first. The output has the same shape as the indices. The put function
copies value into a tensor also using linear indices.
2017-11-01 06:04:44 -04:00
d905a90f0b Clear out eigenvector tensor when eigenvector=F for symeig (#3411) 2017-11-01 05:51:42 -04:00
cf256ee268 Added tensor op check for cudnn rnns (#3409) 2017-11-01 05:51:23 -04:00
e0e4b3a3b5 Fix strides 3D bias descriptor
Reviewed By: ajtulloch

Differential Revision: D6206759

fbshipit-source-id: 7bace218593ded8b854921eaa9811a7ffb49eb69
2017-10-31 22:47:50 -07:00
397793d61c simplify beam search code
Summary: This cleans up the _hack_get_slice_end() using the Conditional operator.

Reviewed By: jmp84

Differential Revision: D6177797

fbshipit-source-id: 5ce0b76b8472123415bba39488aa2c69aad96111
2017-10-31 16:59:20 -07:00
f8cc285e37 Add explicit build dependency on NNPACK
Summary:
Caffe2 fails to build with some old CMake versions because it doesn't figure out that the build implicitly depends on NNPACK build.
This commit adds this dependency explicitly.
Closes https://github.com/caffe2/caffe2/pull/1414

Differential Revision: D6203486

Pulled By: Maratyszcza

fbshipit-source-id: 86f6d9d88976656820f44e3416c57ddf22350362
2017-10-31 16:29:51 -07:00
81b995514e Make THTensor_(var) and THTensor_(std) more numerically stable (#3410) 2017-10-31 18:36:26 -04:00
3c00c0169d Make mm grads column major when the input is column major. (#3406) 2017-10-31 17:55:38 -04:00
db25f8602f Remove order by clause if it is not needed. Increasing timeout from 10mins to
Reviewed By: asaadaldien

Differential Revision: D6167599

fbshipit-source-id: 3e6bdd55d0aa5b497cc1871f237074b3b9ef6f29
2017-10-31 14:51:39 -07:00
6fef6f6dee fix upsample1d (#3407) 2017-10-31 17:49:24 -04:00
d4a0ec62dc Typo fix in torch.median (#3399) 2017-10-31 17:19:40 -04:00
00567a14fc Clarify Slice operator documentation
Summary: Updating the documentation to clarify the behavior of negative end indices.

Reviewed By: jamesr66a

Differential Revision: D6169058

fbshipit-source-id: f14f7cb8b30c26b1cccce104eba8c957a444657f
2017-10-31 12:55:43 -07:00
8cc30e4895 Fix the Fusion Pass (#3362)
* update fuser to match ATen-formatted JIT ops

* fix concat optimizations and add test

* allow onnx export to work with single-export functions

* fix onnx handling of multi-return nodes.

* nits, format, vision test update

* fix add constant

* fix driver init issues

* Add missing Neg symbolic.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-31 13:44:13 -04:00
690256c18c Remove MSELoss test module in favor of wrap_functional 2017-10-31 17:56:40 +01:00
c10898f8ab Revert "ATen expand symbolic"
This reverts commit 47f999b814515a6fac03be9d242d8f24917d3a80.
2017-10-31 08:27:53 -07:00
7b5ac333ad Update README.md (#3392)
Getting started HyperLinks match up now.
2017-10-31 10:48:33 -04:00
Ace
3f6fccd1a8 fixes for torch.nn.Hardtanh (examples and CPU implementation) (#3391) 2017-10-31 14:29:42 +01:00
dce525ab6b adds sample_n function (#3249)
* adds sample_n function

* fixes style issues

* uses more efficient api calls

* fix bug where transpose applied to 1 dimension
2017-10-31 09:04:05 -04:00
bd8bf4a86e enable remaining tests and fix a subtle issue in ConvBackwardBackward around sizes being a reference 2017-10-31 09:02:16 -04:00
b46ee946d9 add double backward for ConvTranspose 2017-10-31 09:02:16 -04:00
429d66549e If available try to use requests instead of urllib for model_zoo.load_url (#3280)
* try to use requests instead of urllib for load_url if available

* add noqa to silence flake warnings

* remove urllib import into the except
2017-10-31 08:53:58 -04:00
e4a3747cd8 Add unit tests for casting onto scalars 2017-10-31 08:51:55 -04:00
c2e8b7aafe Allow casting Variables onto Python scalars 2017-10-31 08:51:55 -04:00
54bfa88eec Allow casting one-element Tensors onto Python scalars 2017-10-31 08:51:55 -04:00
f7b15c52ff partial revert of D6155510 to fix a race condition
Summary:
I (actually by mistake) included some premature optimization in D6155510 for the threaded rnn executor. Unfortunately, there was a subtle race condition when some ops where run out-of-order, but i had made the count down only to count down in the last timestep. Hard to explain.

For caution, revert D6155510's changes to recurrent_network_executor.cc excluding one assertion and setting of the debug flag.

Differential Revision: D6195544

fbshipit-source-id: 24a275e185e5a80835401a8cdcb162dbc2411789
2017-10-31 00:12:18 -07:00
cec27b8134 AddDistributedBlobsSync
Summary: Added a simple function to synchronize a blob across machines (but not across devices), i.e a blobs that are not synced over devices.

Reviewed By: yqwangustc

Differential Revision: D6192922

fbshipit-source-id: a4d653c9fb09f06b0c42330bdae07b42f5e6346c
2017-10-30 22:33:29 -07:00
3bfabb4d5f support float16 input for operator SparseAdagrad
Summary:
Implemented new CUDA class for operator SparseAdagrad. The param and moment inputs now can be float or float16.
The functions for mixed-precision add/mult/store are defined in a separate head file ("caffe2/core/float16_util.h") for reuse purpose.

Reviewed By: azzolini

Differential Revision: D5880200

fbshipit-source-id: dca227f38629a03a9d771f42efe2c0b673075c4d
2017-10-30 19:32:30 -07:00
47f999b814 ATen expand symbolic 2017-10-30 20:24:46 -04:00
669ec0ccba Added FP16 compute support to FC Op
Summary: Allow the GEMMs in the FC/FCGradient Op to do FP16 compute instead of FP32 if the appropriate op flag is set.

Reviewed By: asaadaldien

Differential Revision: D5839777

fbshipit-source-id: 8051daedadf72bf56c298c1cf830b019b7019f43
2017-10-30 17:03:51 -07:00
3e6e81da46 Dispatch trivial variable operators to C++ aten functions. (#3372)
Implement __comparison_ops__ by calling the VariableBase methods.
2017-10-30 19:46:05 -04:00
8cd0df020c make sparse (new) functions conform that storage is not NULL (#3381) 2017-10-30 18:55:26 -04:00
7d096ff7e6 use CAFFE2_ENFORCE_EQ for more detailed error message
Summary: CAFFE2_ENFORCE(a == b) and CAFFE2_ENFORCE_EQ() are functionally equivalent, though the later provides a more detailed failure message.

Reviewed By: salexspb

Differential Revision: D5991775

fbshipit-source-id: 52e4d6d559c933de5b33d791b20223effe9d4f66
2017-10-30 15:44:52 -07:00
eac0942f6d Add more nn docs (#3374) 2017-10-30 18:37:36 -04:00
a5dbc254f8 if git is not installed at all, no subprocess exception will be raised (#3379) 2017-10-30 18:37:12 -04:00
d38fccc586 Debian/Ubuntu comes with GCC 4.9.2 and it does require -D_FORCE_INLINES (#3380) 2017-10-30 18:36:35 -04:00
2be8bd1880 Add docs for ByteTensor any()/all() 2017-10-30 16:00:48 -04:00
1ae10a4831 add test to check zero_strided tensors in blas level 2 and 3 functions 2017-10-30 16:00:21 -04:00
d04574b1fc ensure BLAS/MKL is not used if stride values are not supported 2017-10-30 16:00:21 -04:00
86e3e008e0 optimize RNN executor subnet construction for forward-only models
Summary:
RNN executor had a disadvantage to plain nets when running in forward-only mode: for plain nets, we only create two workspaces and two nets and alternate between them. With RNN executor, we had only four workspaces (4 > 2 because it was faster in some cases), but the nets (or rather the ops) were created for each of the timesteps. This has significant overhead. This diff changes this sos that if executor is is forward-only mode (i.e has limited parallelism setting), then it will use the same operators as the t - 4'th net -- excluding the ops that require the timestep blob. The latter exception is required because RNN executor needs different timestep blob for each timestep because it cannot modify the value of the timestep blob like when running nets in a loop.

Also removed redundancy in the dependency computation and added a debug flag to the executor that outputs the description of the rnn contents.

Reviewed By: salexspb

Differential Revision: D6155510

fbshipit-source-id: c47f727d2128649b081270d15020a08d41e5748d
2017-10-30 12:24:12 -07:00
acb73c729b Space is missing in __repr___ of conv (#3229)
* - Remove spaces in `__repr__` of layers
- Replace `size` by `kernel_size` in `__repr__` of a pooling layer

* Fix flake8 errors
2017-10-30 13:45:37 -04:00
28f3d50f9d doc: Replace nclasses with C 2017-10-30 12:06:20 -04:00
71d731fb57 Fix documentation inconsistencies for some loss classes
- The actual parameter is weight not weights
- Unify all mentions about batch_size -> N
- Unify all mentions about n_classes -> C
2017-10-30 12:06:20 -04:00
b7a9f51de3 In BatchMatMul, add support for accepting inputs >=2d
Summary: Closes https://github.com/caffe2/caffe2/pull/1399

Differential Revision: D6183083

Pulled By: bddppq

fbshipit-source-id: 5c8f17c2de212fbc39a66c90aa2599b714f5ceb4
2017-10-29 23:38:33 -07:00
8fbe003d4e Miscellaneous ONNX fixes and behavior changes.
- Deleted Addmm/Concat Function class, as this is now native ATen operator

- Resurrected ONNX operator for Concat (now called 'cat')

- Add a "fake" Expand ONNX operator, which we now do the optimization on;
  this helps prevent us from emitting a warning that 'expand' is not supported.
  We still fail if any of these Expand operators make it to the final model,
  until we actually formalize Expand in ONNX.  This also simplifies the
  fuseBroadcast code, because single-return ONNX nodes don't get select nodes.

- New error reporting strategy.  If we fail to export an operator because of
  something, we emit a warning, but otherwise keep going.  At the very end,
  in export.cpp, we now check if there are any ATen operators left over.  If
  there are, we bug out.  This assumes that ATen is lower case and ONNX is upper
  case.  You're now supposed to 'return _unimplemented(msg)' in these cases.

- New toString() method on Graph, for getting the string graph (useful for
  slapping it into error messages.)

- Some of the legacy symbolics (still in Python symbolic method of Function
  subclass) have been cleaned up for clarity.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-29 23:50:34 -04:00
40f7f6e095 Improve handling of 'expand' (broadcasting) in JIT and ONNX
The pieces:

- I improved the lint / asserts to catch some bugs which I
  committed while working on my export.  There are two new
  properties which the linter checks now:

    (1) "Anticipated uses".  If a node says that is used by
    M, M better appear later in the topsort.  Previously,
    we only checked if it was in all_nodes.

    (2) If you are a select node, you better be a multi-type node;
    if you're not a select node, you better not be!  And you
    should never have an input that is multi-type.

- There is a new peephole optimization pass, for simple, local
  transformations to graphs.  Right now, it implements a simple
  optimization: remove 'expand' invocations that are no-ops
  (the size before matches the size after), but we can add other
  things to it later.  I needed this for ONNX because no-op expands
  show up in the left-hand argument, which we don't support.

- There is now a broadcast fuser, which fuses ATen expand ops
  into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.)
  It only fuses when the original size is a suffix of the new
  size, as per the ONNX spec.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-29 23:50:34 -04:00
2e42272cc1 Make DataParallel a no-op when CUDA not available (#3318) 2017-10-29 13:47:36 +01:00
bbafd4fa90 Fix native compilation on ARM/Linux (update NNPACK)
Summary:
Brings in Maratyszcza/NNPACK@5974985f99 which fixed native compilation issues on 32-bit ARM Linux systems
Closes https://github.com/caffe2/caffe2/pull/1398

Differential Revision: D6182438

Pulled By: Maratyszcza

fbshipit-source-id: f5b5b96acebf075dddbe89f4e4979e00a50b340f
2017-10-28 16:40:35 -07:00
4f33b136d8 add tests for the previously failing coalesce case 2017-10-28 18:52:35 -04:00
0b89a68111 fix sparse tensor coalesce 2017-10-28 18:52:35 -04:00
bb7b630953 fix pynew gpu_guards 2017-10-28 18:52:35 -04:00
91a8d3325e test sparse dp, broadcast_coalesced, reduce_add_coalesced 2017-10-28 18:52:35 -04:00
01be4d6b20 sparse broadcast_coalesce and reduce_add_coalesced 2017-10-28 18:52:35 -04:00
3a0aee71f3 fix sparse tensor .cpu() 2017-10-28 18:52:35 -04:00
618026e999 implements operator + for Dataset class (#3180)
* implements operator + for Dataset class

* check for exact equivalent
2017-10-29 01:19:59 +05:30
e0d7de5b61 Fix bug introduced in a recent commit
Summary:
This is introduced in 8539a1e78b - vector<float> should not be used in Tensor shape inference.
Closes https://github.com/caffe2/caffe2/pull/1393

Reviewed By: akyrola

Differential Revision: D6181075

Pulled By: Yangqing

fbshipit-source-id: 002144a137148b5b16118d0c123132890e8d325a
2017-10-28 10:11:20 -07:00
a0ce84e476 fix triplet margin loss documentation (#3339) 2017-10-28 17:15:58 +02:00
6d2e39559a Replace Variable constructor with a static_cast in tracer 2017-10-28 17:10:38 +02:00
a381fa10a5 Add a hack for RNN export to ONNX 2017-10-28 17:10:38 +02:00
9107110d3a Add sparseTensor.new wrapper bindings (#3329) 2017-10-28 16:34:08 +02:00
820ac0df2b fix mathjax notation on softmax/softmin (#3338) 2017-10-28 18:10:35 +05:30
42ffb1ae07 support non-normalized weights
Reviewed By: akyrola

Differential Revision: D6158290

fbshipit-source-id: 4d54e5c0d0f91f23deab18da047df4d209d4c312
2017-10-27 23:18:25 -07:00
7b00adf5d3 Add CUDNN_LIB_DIR in rpath (#3255)
* Add CUDNN_LIB_DIR in link -rpath

* insert CUDNN_LIB_PATH in front of rpath
2017-10-28 00:13:53 -04:00
2d0667233a Add .dockerignore. (#3333)
.gitignore should have uninteresting files listed, so acts as a good
.dockerignore. Reduces the build context sent to the docker daemon from to
2.927GB (after building locally) to 66.66MB (:O).
2017-10-28 00:11:11 -04:00
204044a522 Symbolic representation for unfold using ATen (#3334) 2017-10-28 00:08:45 -04:00
ac8f56656d Adapt ONNX Slice op changes (#3316) 2017-10-28 00:03:29 -04:00
dc6c9e8df8 Fix compilation without numpy.
Fix this and related errors:

    Tensor.cpp:309:47: error: ‘PyArray_Check’ was not declared in this scope
2017-10-28 00:50:57 +02:00
e4752518a6 Tiny optimization to AsyncDagNet: wait on fewer events
Summary:
Just noticed while reading the code.

We can wait only to tails of the dag, not every execution chain node.

Reviewed By: akyrola

Differential Revision: D5861078

fbshipit-source-id: f4f6296fed1ccc96b1ab99b4272b82c8bf764ca9
2017-10-27 14:49:45 -07:00
de1f4e69dd raw text (#3327) 2017-10-28 01:24:02 +05:30
f3078dec64 Add cuDNN handles to CUDAContext
Summary:
Add CUDAContext::cudnn_handle() for easier integration of single
cudnn routines into operators without requiring the weight
of CuDNNWrapper or similar, or needing to spin out a separate CuDNN*Op
version of an operator.

It was necessary to split out the cuDNN wrapper code from the base cuDNN helpers in order to resolve a circular dependency between context_gpu.h and common_cudnn.h when handles and cuDNN `#define` were added.
Closes https://github.com/caffe2/caffe2/pull/1376

Reviewed By: pietern

Differential Revision: D6162034

Pulled By: akyrola

fbshipit-source-id: 95687e55b3e1e921e1f5e0f016f43b586f5f3350
2017-10-27 12:03:11 -07:00
86dc6e0837 Added inverted FP16 Initializer
Summary: Added initializer which sets up the ParameterInfo object in the opposite format as the pFP16Initializer. This is needed for when the op requires the initialized blob to be FP32 but a FP16 copy of the weights is needed.

Reviewed By: wesolwsk

Differential Revision: D5840832

fbshipit-source-id: 439e87f41a1dbc58bf63a5c0e7f7fc4cb00b4d65
2017-10-27 10:20:04 -07:00
d8f3c601e4 Add reduce keyword to CrossEntropyLoss 2017-10-27 19:19:52 +02:00
9735ddd899 check_env_flag now ignores case (#3317) 2017-10-27 15:15:02 +05:30
ee3baa2ed4 Add shape checks and print more info in parameter sharing
Summary: As titled.

Reviewed By: kittipatv

Differential Revision: D6145747

fbshipit-source-id: 39a212bb6bebbbf3164cade2f95db22ddb2d2c87
2017-10-27 01:22:06 -07:00
7b7dcaf269 Initialize presence tensor if data is empty.
Summary: See https://fb.facebook.com/groups/811605488888068/permalink/1645450575503551.

Differential Revision: D6116836

fbshipit-source-id: 3072643eaf6f134bda7d224af3d5f8339da1f39d
2017-10-27 01:05:42 -07:00
0b0d5b2b1d Add tensor output that gives the sampled values
Summary: Given an additional tensor containing the values corresponding to the weighted samples, add tensor output that contains the values selected by the sampled indexes.

Reviewed By: akyrola

Differential Revision: D6050094

fbshipit-source-id: 1eccc641b99e30d36ae83d49f630b018a53e4147
2017-10-26 16:04:57 -07:00
879e39ea5c Distill loss with SigmoidCrossEntropyWithLogits
Summary: Sigmoid + CrossEntropy has numerical stability issue. The gradient of sigmoid is `dx = dy * y * (1-y)`. When `label=0` and `x` is large, `1-y` could be round to (near) 0 and we loss `dx`. Switch to `SigmoidCrossEntropyWithLogits` solve the issue because the gradient is not dependent of `y`.

Reviewed By: chocjy

Differential Revision: D6086950

fbshipit-source-id: f990ae726802aa5c56fa62cf5e23f2e61ee047fa
2017-10-26 15:18:34 -07:00
d56713680d Fix const modifiers on VariableImpl 2017-10-26 14:31:29 -07:00
a762fe0b0d Merge commit 'cbe7b8b636ea840fa9e02608011572936fb5a2b3' 2017-10-26 14:31:22 -07:00
86d0c24b6a Dynamically find min log scale #3289
* dynamic fina min scale

* compute only once each _number_format call
2017-10-27 02:42:16 +05:30
fa0f3cf98a Re-enable and fix most JIT tests 2017-10-27 02:40:09 +05:30
61afb0d519 Autogenerate ATen dispatch for JIT nodes 2017-10-27 02:40:09 +05:30
869bdeb936 Symbolic implementation of Index supporting tuple of slices. (#3294) 2017-10-27 02:39:38 +05:30
e0fa72455d Fixes the checkpoint test.
Summary:
We need to use Cluster to isolate the definition of the nodes.
Otherwise, the contexts are polluted and the run becomes
stateful.

Reviewed By: Yangqing

Differential Revision: D6140404

fbshipit-source-id: 09d1c86ef12bb01eaa16b1dade4d2e1e93be287a
2017-10-26 13:18:21 -07:00
545c0937fb Making a module option for Caffe2
Summary:
This will help releasing models that are using Caffe2 but have their own operator implementations and extensions. More detailed docs to arrive later. Let's see what contbuild says.
Closes https://github.com/caffe2/caffe2/pull/1378

Differential Revision: D6155045

Pulled By: Yangqing

fbshipit-source-id: 657a4c8de2f8e095bad5ed5db5b3e476b2a877e1
2017-10-26 12:33:58 -07:00
c3a4bc5d73 fix asan-error by removing SHOULD_NOT_DO_GRADIENT from .cu file
Summary:
For some reason, having SHOULD_NOT_DO_GRADIENT in a .cu file (this is for an only-CUDA operator) will cause double-free error detected by asan. This is why innocent looking D5837837 caused automatic asan tests to fail (at least on Xray).

Removing these entries makes the error go away, and is ok because we don't really need these tags. But it would be nice to understand what causes the double-free. I don't have time to investigate myself now.

Reviewed By: Maratyszcza, salexspb

Differential Revision: D6161559

fbshipit-source-id: a52cb2a9cc62f2ec54ed866846f2bd1ccb0ae90f
2017-10-26 11:55:03 -07:00
3853d5da97 Add reduce keyword to NLLLoss and NLLLoss2d (#3080)
* API changes

* Implement reduce for THNN ClassNLLCriterion

* Implement reduce keyword for THCUNN ClassNLLCriterion

* Implement reduce for THNN SpatialClassNLLCriterion

* Implement reduce for THCUNN SpatialClassNLLCriterion

* Make legacy NLLLoss work

* Docs for NLLLoss reduce

* reduce keyword for double backwards NLLLoss

* reduce=False tests

* Addressed comments

* Fix trailing whitespace

* Fix test failures in legacy nn

* Rebase: add reduce keyword to aten declarations of NLLLoss

* Add reference functions for all NLLLoss and NLLLoss2d test cases

* Replaced slow get/set fns. Don't use int64_t in kernels.

* Use TH_INDEX_BASE in NLLLoss for consistency

* Fix legacy ClassNLLCriterion tests
2017-10-26 13:54:19 -04:00
0664b30612 Update NNPACK submodule
Summary:
CMake scripts in NNPACK use enum34 polyfill for PeachPy to support pre-3.4 Python interpreters, which do not have built-in enum module. This polyfill was found to be conflicting with built-in enum module on Python 3.6, and I updated NNPACK CMake scripts to only use polyfill for Python < 3.4. This commit propagates this change to Caffe2, so Caffe2+NNPACK can be built on systems with Python 3.6.
Closes https://github.com/caffe2/caffe2/pull/1389

Reviewed By: bddppq

Differential Revision: D6161663

Pulled By: Maratyszcza

fbshipit-source-id: c8aa07def6abe252a0a2ab927f6c49ccd846ab93
2017-10-26 10:48:07 -07:00
bdeee47d33 Add zero, zeros_like, _dimI and _dimV for sparse tensors (#3271) 2017-10-26 18:28:04 +02:00
5760b036fb Fix pack_padded_sequence to accept inputs of arbitrary sizes 2017-10-26 17:40:03 +02:00
cbe7b8b636 Adds permute and as_strided to ATen (#137)
Permute transposes multiple dimensions at once. The as_strided function
changes the sizes and strides of a tensor without changing the Storage.
It's a subset of Tensor::set_.
2017-10-26 11:35:29 -04:00
21ff182809 improve padding code 2017-10-26 12:03:17 +02:00
a99506f2fc fixed error: namespace "std" has no member "min" 2017-10-26 09:24:13 +02:00
6e33ae79df Add gradient op for WeightedSum op
Reviewed By: dzhulgakov

Differential Revision: D6149163

fbshipit-source-id: 0e8cf400323233d001243bc5cb25a0025115a564
2017-10-26 00:16:51 -07:00
63297e1a1f RunNetOnce->RunNet (removes rnn_executor overhead)
Summary:
seq2seq/translate.py was running much slower on RNNExecutor. This was because RNNExecutor has significant init overhead (I have another diff to reduce, but not completely eliminate it), and translate was calling the decoder with RunNetOnce -- thus always recreating the net and the ops. Changhing this to RunNet() makes translate run faster than without executor. RunNet uses the net name and uses the already created net, while RunNetOnce passes the whole protobuffer.

Noticed similar bug in seq2seq ensemble bean model, which also calls CreateNet() but uses RunNetOnce() instead of RunNet().

Reviewed By: jhcross

Differential Revision: D6156566

fbshipit-source-id: a933453e36a0d8fd163d0584186fda427a680687
2017-10-25 22:06:02 -07:00
b3b7203b40 Symbolic representation for mm (#3290)
* Symbolic representation for mm

* Fix whitespace issues
2017-10-26 00:29:57 -04:00
8afbdd8dcf Make toScalarType and toBackend virtual
This allows VariableType override them to return instances of
VariableType. Combined with the change to Formatting.cpp, this lets us
print Variables to std::cout.
2017-10-25 20:46:34 -07:00
2bcca48a62 Make size, strides, dim functions const. 2017-10-25 20:45:07 -07:00
c03799e8eb Change is_same_size to a native function.
For one thing, we will want a different implementation from TH because
we need to differentiate between scalars and 1-dim tensors.

Also, we don't really want to expose the THS/THCS function; in addition to
checking the shapes are the same, it checks that the dimensions which
are sparse are the same (because various THS/THCS operators only work if this
is true; it should really be called "is_congruent" or similar.
2017-10-25 20:44:29 -07:00
715ca3a2c8 Add unsqueeze of scalar to wrapdim_test. 2017-10-25 20:44:06 -07:00
fcdd394f66 bind newWithTensor in ATen (#129) 2017-10-25 20:43:44 -07:00
5bb8ed67e3 Compute GLU for an arbitrary axis
Summary: As in title

Differential Revision: D6151804

fbshipit-source-id: bd0fa08be1676ebd1abd9720711c221c61c11ad1
2017-10-25 19:49:55 -07:00
817eaf6b1f Build NNPACK using its own CMake scripts
Summary:
NNPACK now supports building with CMake, and its build scripts have advantages over the ones in Caffe2:
- They automatically download all dependencies, no need to keep them in submodules anymore
- They automatically download and setup PeachPy for x86-64 build
- The same scripts are used for server/desktop (Linux, macOS) and mobile (Android/iOS)
- They unblock Caffe2 build with Ninja
Closes https://github.com/caffe2/caffe2/pull/1382

Reviewed By: Yangqing

Differential Revision: D6150723

Pulled By: Maratyszcza

fbshipit-source-id: 7c3e4e3406f60d4cc059e1c8112cb10aa3d75ece
2017-10-25 18:48:06 -07:00
4819197a40 fix merge problems 2017-10-25 17:54:32 -07:00
3b26b48d90 missing code from pytorch 2017-10-25 17:32:17 -07:00
699e47d380 missing entry 2017-10-25 17:27:32 -07:00
39359afc84 Add rank loss for retrieval models with random negative sample
Summary:
In order to reproduce StarSpace model using the architecture of Two Tower model, we need to implement the ranking loss that is used in StarSpace as well as Filament model. In both StarSpace and Filament model, all negative samples come from random negative sampling, thus the number of negative sampler per positive record is fixed (say 64). To calculate the total loss, for each positive record, the hinge distance between the positive score and negative scores (the 64 scores in the example) are calculated. This diff implement this loss in Dper framework.

The main idea is to add an option so that negative_sampling.py can output random negative samples as an independent field rather than merged with the original input_record. In this way, we can calculate the positive score and negative score separately, which will eventually been used when calculating the ranking loss.

(Note: this ignores all push blocking failures!)

Reviewed By: kittipatv

Differential Revision: D5854486

fbshipit-source-id: f8a5b77be744a6cc8a2b86433282b3b5c7e1ab4a
2017-10-25 16:19:41 -07:00
a7c5be1d45 Document CUDA best practices (#3227) 2017-10-25 22:38:17 +02:00
837f933cac remove 'path' from key_averages header
path appears to be unused
2017-10-25 21:34:59 +02:00
a65db4e956 Use ATen for torch.cat, torch.addmm, and friends on Variables. (#3286)
This includes some changes to the dispatch code for torch.xxx functions:

 - Since Variable.addmm is an instance-method, the self argument has to
   come first. The dispatch code swaps the first two arguments if
   necessary to suppor the deprecated signatures where 'alpha' or 'beta'
   comes before the 'self' tensor.
 - Delete IMPLEMENT_STATELESS_REVERSED. These functions require output
   arguments to be passed in using the keyword 'out'. They were meant to
   handle torch.gt(out, a, b), but we haven't allowed that for a while.
2017-10-25 14:27:45 -04:00
d67624173b Change RowWiseSparseAdagrad assertion message
Summary: Made the asesrtion messasge clearer to let people know that rowwise is not supported for dense adagrad.

Differential Revision: D6135363

fbshipit-source-id: d706135a335305627310c69a2a6d7721b0a47f0e
2017-10-25 10:54:33 -07:00
43f0c74461 Merge commit '48911e116d43ab2b887fb714e30de09676a765a3' 2017-10-25 09:34:29 -07:00
b46ced4aab clarification in docstring of Module.register_forward_hook() (#3279)
* made it explicit in the docstring of Module.register_forward_hook() that the hook(s) will be called AFTER calling forward().

* added "every time" in docstring of Module.register_forward_pre_hook()
2017-10-25 15:36:00 +02:00
b3642b3e65 Softmax/LogSoftMax refactor (wrapped up) (#3245)
* Unify CUDA kernels for SoftMax and LogSoftMax

* Improve SoftMax and LogSoftMax kernels performance

Added a new instantiation of the spatial kernel for
low inner_size and larger dim_size.
2017-10-25 14:47:56 +02:00
e43a63a968 tensor: Ensure that the tensor is contiguous before pinning (#3266) (#3273)
* tensor: Ensure that the tensor is contiguous before pinning (#3266)

pin_memory() was producing out-of-order tensor when the given
tensor was transposed, i.e. in column-major order.
This commit fixes this by calling contiguous() before pinning.

* test: add contiguous test for pin_memory (#3266)
2017-10-25 13:17:54 +02:00
b5170c8bf1 improves pack padded sequence operation runtime #1788 (#3278)
* improves pack padded sequence operation runtime #1788

* error message
2017-10-25 13:16:32 +02:00
9989bb1a43 Export index constants as long, not int (onnx-caffe2 needs it.) (#3274)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-25 09:50:33 +02:00
241b9f6c14 disable rnn executor for beam search
Summary:
RNN executor has significant overhead of creating the timestep-nets the first time, and this is especially bad with beamsearch that is complex.
So disable RNN executor for now until perf regression is fixed (I have pending diff on it).

Reviewed By: salexspb

Differential Revision: D6138878

fbshipit-source-id: ce63ab9ce9cc1c0f67097aea1e370494ca98c680
2017-10-24 20:49:56 -07:00
48911e116d The at::cat should default to dim=0 2017-10-24 18:30:31 -07:00
e760e63244 Handle remainder=0 case correctly
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-24 19:33:37 -04:00
df71f2aef5 ONNX export for split.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-24 19:33:37 -04:00
4b1e85d266 Remove split/chunk python autograd. 2017-10-24 19:33:37 -04:00
fe0ac0f7d0 Support native functions in C++ autograd automatically. 2017-10-24 19:33:37 -04:00
59e0472af1 Merge commit '9f6a41d63d37d5108f62d8499de4e691adac09e6' 2017-10-24 15:30:01 -07:00
5fc122bf39 Fix to #2236 - tensor.numpy() checks that no positional arguments are passed. (#3224)
* tensor.numpy() checks that no arguments are passed

* tensor.numpy() checks that no arguments are passed

* Improve .numpy() argument checking performance
2017-10-24 23:54:28 +02:00
3de3ac31cc Merge commit 'd131219742d6efd31b4986342e22c0184a6d4340' 2017-10-24 16:06:56 -04:00
2e4d8aa530 Added FP16/FP32 MomentumSGD + WeightDecay Update Ops
Summary:
Added two new ops, FP16MomentumSGDUpdate and FP32MomentumSGDUpdate, which perform both the momentum sgd and weight decay updates to a given parameter in a single op -- thus being more efficient.

Also updated the standard momentum sgd test to test if nesterov momentum works.

Reviewed By: asaadaldien

Differential Revision: D5837837

fbshipit-source-id: 5ad487b9c59434491d3a4fcfdeed820db6083f57
2017-10-24 12:28:16 -07:00
9f6a41d63d Support default parameters for native functions. 2017-10-24 12:16:05 -07:00
f9d002d9f7 perf improvements for depthwise convolutions (#3265)
The biggest performance improvements are due to templating kernels.  See PR for some numbers.
2017-10-24 14:57:47 -04:00
a0aa6d0e24 expose flop annotation to python
Summary: expose the flop annotation framework to python functions

Reviewed By: Maratyszcza, Yangqing

Differential Revision: D6135705

fbshipit-source-id: 2eed80b6cbda7b3ee3fe0e019a0f1fc4b0aa320b
2017-10-24 11:35:24 -07:00
0b8b9cf928 update 2017-10-24 20:30:49 +02:00
d131219742 smarter backend option 2017-10-24 11:16:42 -07:00
5691b0b8d2 Fix the Slice changes in ONNX (#3216) 2017-10-24 14:12:54 -04:00
388a1b1e66 Added FP16SgdOptimizer
Summary:
Added FP16SgdOptimizer to optimizers. The optimizer updates the params using the FP16MomentumSGDUpdate and FP32MomentumSGDUpdate ops. To determine which update op to call the optimizer expects either the fp32_update flag to be set, or that the blobs are in a recognized format created by initializers.py.

These requirements can be loosened if the blob DataType can be queried in python, though I am unsure of how to do this.

It also forces FP32 updates to SpatialBN as CuDNN does not support FP32 params for SpatialBN.

Reviewed By: asaadaldien

Differential Revision: D5840806

fbshipit-source-id: 84ab8dc11a6e91a198ed72c00287f4809607079d
2017-10-24 10:44:04 -07:00
ed08533a1e Add CUDA version of ScatterAssign
Reviewed By: houseroad

Differential Revision: D6128352

fbshipit-source-id: ea59f4bc723ef929b0f6ed15797df776d8054422
2017-10-24 10:20:03 -07:00
cc5a948e62 Fix clang-802.0.42 tuple overload bug, fixes #3234. (#3252)
* Fix clang-802.0.42 tuple overload bug, fixes #3234.

Originally, my plan for emit_record_trace was to keep it as
simple as possible, if at the expense of some somewhat ugly
overloads.  So this meant we had a 'recordTrace' function
with overloads like this:

  recordTrace(..., const Variable& out)
  recordTrace(..., const std::tuple<Variable, Variable>& out)

Unfortunately, this triggers a bug in clang-802.0.42
(widely used in macOS Sierra 10.12.6) wherein a Variable is
implicitly convertible into a std::tuple<Variable, Variable>;
a minimal repro can be seen below here:

  #include <tuple>
  struct T {};
  void f(const std::tuple<T, T>&) {}
  void g(T& x) { f(x); }

To work around this bug, the code generator is a bit more
complicated, and is taught how to handle this situation.

Previously the generated code looked like:

  jit::tracer::recordTrace( "min", { self }, ret );

Now it looks like:

  jit::tracer::recordTrace( "min", { self }, { std::get<0>(ret), std::get<1>(ret) } );

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR comments

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-24 13:13:38 -04:00
1b71bf1d36 Updated resnet50_trainer and resnet for more FP16 support
Summary: Added FP16SgdOptimizer to resnet50_trainer

Reviewed By: wesolwsk

Differential Revision: D5841408

fbshipit-source-id: 3c8c0709fcd115377c13ee58d5bb35f1f83a7105
2017-10-24 09:19:06 -07:00
1a0e4e1b00 sparse cuda and get device (#122) 2017-10-24 17:31:01 +02:00
0748ea56eb Change size by kernel_size in __repr__
Probably, __repr__ should return `MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)))` -> `MaxPool2d (kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1)))`
2017-10-24 12:46:19 +02:00
25ed9aba03 Remove C exports and rename AT_API 2017-10-23 20:30:55 -07:00
c6671f4379 Fix missing <functional> and export decorations in lib/ATen 2017-10-23 20:30:55 -07:00
8e58135a26 Fix E722 ('do not use bare except') (#3239)
The new version of flake8 includes a check for not using bare except. We
should avoid this since it catches things like KeyboardInterrupt.
2017-10-23 23:03:37 -04:00
9cca84a96f Remove dead code 2017-10-23 19:31:53 -07:00
512a8015b8 Gated Linear Unit implementation
Summary: As titled

Differential Revision: D6117600

fbshipit-source-id: 84b0154dc4cf77cc9c9146e9a534c7485989346b
2017-10-23 18:14:57 -07:00
d02ca80613 Revert the enum changes as discussed 2017-10-23 17:55:27 -07:00
7660c4cfe5 Fix linguist detection with gitattribute overrides
Summary:
Before:
```
34.34%  Jupyter Notebook
34.06%  C++
20.95%  Python
4.84%   Cuda
2.36%   C
1.78%   Objective-C++
1.05%   CMake
0.27%   Metal
0.19%   Shell
0.05%   Objective-C
0.04%   Batchfile
0.04%   HTML
0.02%   CSS
0.01%   Makefile
```
After:
```
51.87%  C++
31.91%  Python
7.37%   Cuda
3.59%   C
2.72%   Objective-C++
1.59%   CMake
0.42%   Metal
0.30%   Shell
0.08%   Objective-C
0.06%   Batchfile
0.06%   HTML
0.02%   CSS
0.01%   Makefile
```
Closes https://github.com/caffe2/caffe2/pull/1375

Differential Revision: D6130054

Pulled By: Yangqing

fbshipit-source-id: e98383381cbde636473e017f204eb6afedb5a34a
2017-10-23 17:03:07 -07:00
c6ef04db04 Add "dtype" parameter for GivenTensorOp
Summary: Adding "dtype" parameter for the GivenTensorOp. Also, providing backwards compatibility for the existing code, byt supporting the templating if "dtype" is not provided.

Reviewed By: bddppq

Differential Revision: D6090049

fbshipit-source-id: f5deaa57b49f2280289975f4583aba5bc064a2bc
2017-10-23 16:06:37 -07:00
f0ca857e6b Explicitly use Eigen MPL2 in builds.
Summary:
Closes #1371
Closes https://github.com/caffe2/caffe2/pull/1372

Reviewed By: asaadaldien, houseroad, akyrola, bwasti

Differential Revision: D6125792

Pulled By: Yangqing

fbshipit-source-id: 5fd7ee9a5d77381fe9afbe899ef18465ecd1ceea
2017-10-23 15:06:38 -07:00
f2f057af99 Support MetaNetDef model specs on mobile
Reviewed By: ajtulloch

Differential Revision: D6010327

fbshipit-source-id: 5f1b81fc9ba92889044b89ae766c45c2d3c090d8
2017-10-23 14:53:03 -07:00
5795b173de Fix LogSoftMax (#3244) 2017-10-23 22:40:42 +02:00
50049168a6 Pybind v2.2.1
Summary:
Bumps the pybind version from v1.8.1 to v2.2.1, resolving all compile & runtime issues that arose.

Upgrades to the API used https://github.com/pybind/pybind11/blob/master/docs/upgrade.rst as the point of reference.

This also solves a long-standing bug we had, where a type would spontaneously and intermittently change in the C++ -> Python boundary.

\cc Yangqing
Closes https://github.com/caffe2/caffe2/pull/1308

Differential Revision: D6125152

Pulled By: pietern

fbshipit-source-id: 67839a9654c655d143820c6686c311beba64eff2
2017-10-23 11:32:49 -07:00
5afc166769 Fix lint build (#3237)
The flake8 package was upgraded to include new errors which cause the
build to break.
2017-10-23 14:04:56 -04:00
e870f569db Fix core_overhead_benchmark building issues
Summary:
The GPU version of core_overhead_benchmark needs CUDA_curand_LIBRARY.
Closes https://github.com/caffe2/caffe2/pull/1365

Reviewed By: Yangqing

Differential Revision: D6125248

Pulled By: houseroad

fbshipit-source-id: de6ffcbd1f5b685b06560cae860ff0b26cb86ddc
2017-10-23 11:03:06 -07:00
b92e06e50e Fix reference counting bug in python_nn_functions.cpp (#3236)
Py_InitModule returns a borrowed reference. PyModule_AddObject steals
the reference, so we need to incref the `_nn` object.

(The Python 3 function PyModule_Create returns a new reference.)
2017-10-23 12:35:59 -04:00
a806d1ad69 make softmax test name unique case-insensitive 2017-10-23 06:29:58 -07:00
dc6510f7ed fix copy elison warnings / get rid of an std::move 2017-10-23 01:18:34 -07:00
d5604aea0b Don't create grad_fn if requires_grad=False (#3212)
Don't create grad_fn if requires_grad=False

 - Check that arguments without derivative definitions have
   requires_grad=False
 - Pass all tensor arguments to the tracer, including ones without
   derivative definitions
2017-10-22 18:41:04 -04:00
891f41c14b Upgrade to 2.2.1
Summary:
Update pybind from 1.8.1 to 2.2.1
aarch64 platform updates pending.

Reviewed By: houseroad, kmatzen

Differential Revision: D6089712

fbshipit-source-id: 80ce09c381717f4317e2e698479ff604cf28c709
2017-10-22 13:26:56 -07:00
0989889251 Fixing lib/THNN build for Windows (#3217) 2017-10-22 12:19:00 +02:00
6a4182eead weighted sample op cuda
Summary: CUDA version of weighted sampling operator; minor changes for CPU version

Reviewed By: asaadaldien

Differential Revision: D6106668

fbshipit-source-id: 42d7607bd845a4a39cf5b89d7476904cb5928431
2017-10-21 18:49:59 -07:00
67839ce7bc Delete unused Softmax code (#3220)
Softmax and LogSoftmax are automatically bound and dispatched through
VariableType.
2017-10-21 20:51:27 +02:00
129336cb06 [dlpack] Memory management for dlpack 2017-10-21 20:19:51 +02:00
6b5f57b397 Make make_image_db multi threaded
Summary:
While waiting for the single threaded version to complete I noticed it
was doing an awful lot of waiting, so decided to make it multi
threaded. Creating a 150GB DB is now ~4x faster on an AWS EBS volume.
Closes https://github.com/caffe2/caffe2/pull/1334

Reviewed By: romain-intel

Differential Revision: D6045259

Pulled By: pietern

fbshipit-source-id: 43f9392a0a383355660a3ead217ab38939dd2bc2
2017-10-20 16:03:24 -07:00
ba1dba45f7 Finish #1358
Summary: Closes https://github.com/caffe2/caffe2/pull/1362

Differential Revision: D6115853

Pulled By: Yangqing

fbshipit-source-id: 581713e328f778fe916114f4f52d7089bc25bc3c
2017-10-20 15:47:58 -07:00
d89d9d74bd Fix Python 3 portability problem. (#3209)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-21 00:44:29 +02:00
ed9c43774c Don't resize output in cpu torch.gels (#3204)
* Don't resize output in cpu torch.gels when m > n
2017-10-21 00:43:42 +02:00
39729aa55c Add GPU support to operator RowWiseSparseAdagrad
Summary: Previously, CPU version of operator RowWiseSparseAdagrad has been implemented, Here the GPU version of of the operator has been implemented and tested

Reviewed By: azzolini

Differential Revision: D6082828

fbshipit-source-id: 74befd495666c357d5ab425a698c5880cd8f927c
2017-10-20 15:20:50 -07:00
53fe804322 Make ONNX work with new C++ autograd world.
The general strategy is there is a new module, torch.onnx.symbolic, which
contains a function for every ATen method name with the ONNX translation.
While implementing this, I took the opportunity to expunge all references
of 'g' from the public API; instead, it is managed by a global variable in
torch.onnx which tracks the "current graph".

Other changes:

- If you pass a Tensor to op as an argument, it will now automatically be
  converted into a Constant ONNX node.  This lets us remove needing to
  implement ONNX

- Rename value to other, wherever there is both a Scalar and Tensor overload.
  This way, keyword dispatch can work uniformly in both cases.

- Deleted any autograd Function classes that both had a symbolic and were ported
  to the new C++ autograd implementation.  There may still be some straggling
  classes that didn't have symbolic.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 15:38:01 -04:00
e64f40ae5b Add tracing to the new ATen style API.
The generated tracing code looks like this:

    if (jit::tracer::isTracing({ self })) {
      jit::Node *n = jit::tracer::recordTrace( "mean", { self }, ret );
      n->rawSet(jit::stringToSymbol("dim"), dim);
      n->rawSet(jit::stringToSymbol("keepdim"), keepdim);
    }

A few design decisions I made:

  - Instead of making the assignment of 'n' conditional on whether or not
    attributes are present, I just add (void)n if it would not be used
    otherwise.  This modestly simplifies code generation.

  - Tracing of operations that involve Generator or Storage are not supported.
    This is fine because such ops don't take any Variable arguments anyway,
    so they couldn't trigger tracing.

  - Unfortunately, at::ArrayRef is not covariant, so there is some faffing about
    to support conversions from at::ArrayRef<Tensor> (aka TensorList) to
    at::ArrayRef<Variable>.  In the case of 'recordTrace' (slow path), I just
    allocated an intermediate std::vector to get the types correct; in the case
    of isTracing (fast path) there's three overloads to avoid refcount bumping
    when possible.

  - Tracing is all in one place, rather than spattered between the beginning
    and end of an ATen function, as Sam suggested.

  - This commit doesn't actually enable ATen definitions.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 15:38:01 -04:00
0589dfab81 nested_dict documenting comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 15:38:01 -04:00
5989b05ecc Enable ATen implementation of some NN functions and Variable methods 2017-10-20 15:38:01 -04:00
a385979677 Guard against executing the Hardshrink on CUDA
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 15:38:01 -04:00
507319ca39 Revert "Speed up norm_backward"
This reverts commit 17a817190c72e8ee48919c9c326e74b385764f5a.
2017-10-20 15:38:01 -04:00
5a0ded4dad PyTorch fixes for latest ATen:
1) softmax, log_softmax backwards now have int64_t dim argument
2) chunk/split in autograd/functions/tensor.cpp conflict with new
   ATen implementations, just delete them and use the ATen ones.
3) div/mul with Scalar now use "other" parameter rather than "value"/
2017-10-20 11:05:11 -07:00
96cb3f7c80 Merge commit '10df3496cbe392fa06648600aff3682a490e43c5' 2017-10-20 10:57:32 -07:00
10df3496cb Fix typos in orgqr and orgmqr
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 10:56:00 -07:00
9560540084 Add missing string include, fixes https://github.com/pytorch/pytorch/issues/3192
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 10:56:00 -07:00
16095b5737 Rename value to other, wherever there is both a Scalar and Tensor overload. (#115)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 13:54:30 -04:00
02f4303749 Use own benchmark and not any system pre-built ones:
Summary:
(1) use the cmake files of the corresponding libs
(2) allow static linkage of gtest and gbenchmark.
(3) Helps removing the temp solution in #1112

We are yet to disable the installation of the benchmark library, and I have an open pull request at https://github.com/google/benchmark/pull/463 - once it is merged I will do submodule update.

cc lukeyeager pietern who had this issue before - hopefully this makes the solution cleaner.
Closes https://github.com/caffe2/caffe2/pull/1358

Differential Revision: D6111404

Pulled By: Yangqing

fbshipit-source-id: 17468d32cef27f96e9445d119eb869c9c7913118
2017-10-20 10:37:44 -07:00
25bfffeafe Swish Activation Function
Summary:
Swish: A self-gated activation function.
https://arxiv.org/pdf/1710.05941.pdf

Reviewed By: ajtulloch

Differential Revision: D6100424

fbshipit-source-id: 0103d6d82e9ffb50106c98a8785e62b8808e9af1
2017-10-20 10:37:43 -07:00
0c0c9e743e Fix dimensions check
Summary:
To match CPU implementation [here](https://github.com/caffe2/caffe2/blob/master/caffe2/operators/segment_reduction_op.h#L323)
Closes https://github.com/caffe2/caffe2/pull/1360

Differential Revision: D6111071

Pulled By: Maratyszcza

fbshipit-source-id: ba0019ff483ff28f4aa452103c3bad5d9294af96
2017-10-20 10:28:01 -07:00
246701df81 Separate out native processing into procecss_native; remove (TH)Type specific logic. 2017-10-20 10:12:26 -07:00
90e396f6bb Support 'native' ATen functions with Tensor, (base) Type, NS impls.
This adds the ability to specify 'native' functions in NativeFunctions.h and specifies
'split' and 'chunk' in this manner.  The function arguments, returns, variants, etc. are
specified as if they were processed via other parsing mechanisms (e.g. cwrap_parse) with
the following additional parameters:

type_method_definition_level: this allows one to specify that the type method should
be defined at the 'base' type level; this is because in the case of 'split' and 'chunk'
(and probably most/all other native functions that don't directly dispatch to TH/THC)
we don't need type-specific implementations.  Currently it is enforced that 'base' is
specified for native functions, but this is easy to remove later.

type_method_definition_dispatch: this defines the function to dispatch to.  For split,
this is at::native::split; this is just to avoid having a magic namespace and allowing
one to dispatch to a function with a different name.
2017-10-20 10:12:26 -07:00
5b3931c119 logic fix for repeated sequence masking
Summary: Logic fix for sequence masking repeated along data dimensions.

Reviewed By: jamesr66a

Differential Revision: D6109418

fbshipit-source-id: 1a006e863a26e627039d7a88c922625d50bde8e3
2017-10-20 10:09:22 -07:00
fea60da92e Update DLPack tensors enum to avoid binary issues and expose one function 2017-10-20 10:08:54 -07:00
0b0f24a71b disable test_cudnn_weight_format when CuDNN not available (#3200) 2017-10-20 19:06:53 +02:00
76abc06b1f Fix nvprof mode in autograd profiler 2017-10-20 10:22:54 -04:00
17a817190c Speed up norm_backward 2017-10-20 10:21:28 -04:00
634c8315a4 isContiguous problems (#3148)
* with the size=1 case, impossible to do single point check, replace with isContiguousRange

* fix stride in desc; fix undef scope

* add test for this case for cudnn

* assertTrue
2017-10-20 10:20:33 -04:00
2797c8005b Update THDTensor.cpp 2017-10-20 10:17:27 -04:00
50de9160aa Update THDTensor.cpp
Add `__STDC_FORMAT_MACROS` to fix gcc issues
2017-10-20 10:17:27 -04:00
ee62a595fc ScatterAssign int types
Summary: Closes https://github.com/caffe2/caffe2/pull/1357

Reviewed By: dzhulgakov

Differential Revision: D6107036

Pulled By: bddppq

fbshipit-source-id: 9278dae988c3c0656b4e4fd08bf7ca1e2eec3348
2017-10-19 23:22:54 -07:00
d8ad5de560 Fix intermittent segfault on Python 2.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
6ebfa20ab9 Include math.h for M_PI.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
147287a33c Fix the build on clang.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
7d95127a4f Squash ATen warning
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
67612cba09 Add -Wno-missing-braces
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
8faffef321 Make flags overloads compile.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
3696300fcf Include Python.h less using a new stub header.
In many "non-Python" headers, we include Python.h because we need
to declare a pointer to PyObject, and solely because of that.  It
would be a lot better if we had a simpler version of Python.h that
just declared PyObject available for pointers, without anything
else.  This is what torch/csrc/utils/python_stub.h does.

The good thing about not including Python.h is that it is easy to
be warning-less; no more ugly insertions of Python.h on headers
where it has no good reason to be.

This makes PyTorch warning clean again.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 23:04:19 -04:00
8b3acd7d7b Check that type_str is in the type_map (#3191) 2017-10-19 22:54:25 -04:00
623f2bf815 Add GivenTensorInt64Fill on gpu
Summary: Before we fix it properly with 'type' argument.

Reviewed By: bddppq

Differential Revision: D6103973

fbshipit-source-id: 8c00a93c373dd0ad0bbfe59944495f6574223ab6
2017-10-19 18:32:41 -07:00
0da15f913c Change softmax and log_softmax to take int64_t dim rather than int.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 17:50:17 -07:00
357f9b6f01 Squash ATen warning
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-19 17:40:22 -07:00
f7ad13694c support model init
Summary:
a parameter can be initialized multiple times in init_net if parameter sharing is enabled. With the original implementation, only the first parameter init will be replaced by pre-trained parameters and the next are still unchanged. This overwrites the initialization with pre-trained parameters.
This diff fixes this issue and also support model init for ads-intent project

Reviewed By: dragonxlwang

Differential Revision: D5991291

fbshipit-source-id: 36173f6239c56bd0d604a77bd94e36072f32faa7
2017-10-19 15:56:37 -07:00
e5e6c71743 include memory and map from observer.h
Summary: include memory and map from observer.h

Reviewed By: ajtulloch

Differential Revision: D6094338

fbshipit-source-id: f39b27cb76dae3b06816bb9ae37c2c1f96eaa8ba
2017-10-19 15:19:25 -07:00
e970d35091 Make VariableVersion refcounting thread-safe (#3184)
I've also made the version counter and the "live" reference count
atomics.

Note that it's not safe to set the version counter (operator=) from
multiple threads, because shared_ptr assignment isn't thread safe.
Currently, the only call sites to these functions are on newly created
variables before they can be accessed from other threads.

See #3111
2017-10-19 17:22:01 -04:00
db6a9d2ae4 Fixes type inference for Slice and GivenTensor*Fill operators
Summary:
Currently, the type inference infers FLOAT as the type for all GivenTensor*Fill operators. However, the inferred type should match the actual operators.

Also, for `Slice` operator, there is a corner case where type inference fails

Reviewed By: azzolini

Differential Revision: D6096813

fbshipit-source-id: d65b7c0f42436138cbc49d8a5a62374fa5e927e1
2017-10-19 14:02:21 -07:00
7b30436201 remove Alias in SparseFeatureHash
Summary: remove Alias in SparseFeatureHash

Reviewed By: kennyhorror

Differential Revision: D6094663

fbshipit-source-id: f313aeb17bf6cfdacae62b2c1ad6b4175d0882dd
2017-10-19 13:24:20 -07:00
d9b89a352c Replace StochasticFunctions v2 (#3165)
This removes the StochasticFunctions for bernoulli, multinomial, and
normal and replaces them with classes in the torch.distributions
package. Each distribution supports the differentiable log_prob function
that returns the log of the pdf/pmf of the samples.

The current StochasticFunction implementation has a few problems: it can
be painful to use when there are multiple stochastic outputs which need
to be back-propagated through. It also requires that we store grad_fns
on Variables that have requires_grad=False in order to find stochastic
nodes.
2017-10-19 15:05:07 -04:00
f1f64c8d07 Generate autograd functions for NN / more refactors (#3136)
Generate autograd functions for NN and implement more derivatives in derivatives.yaml

A big refactor of gen_variable_type.py
2017-10-19 15:03:26 -04:00
98e67448fa Large Softmax and LogSoftmax refactor
- Cleaned up THNN and THCUNN code and kernels
- Improved THCUNN kernel performance 5x, making it match cuDNN performance
- Added support for computing softmax over arbitrary dims
  NOTE: The default dim for 3D inputs is now 1 (used to be 0)
- Both functions now accept inputs with arbitrarily many dimensions
- Autograd functions no longer save the input (it's unnecessary)
- Added cuDNN bindings for softmax, but they are unused as THCUNN
  matches or even exceeds cuDNN performance
2017-10-19 19:51:10 +02:00
3a4ca7a269 Add support for saving the output in autogenerated functions 2017-10-19 19:51:10 +02:00
a1518b7801 CMake changes to make Caffe2 more friendly for dependent libraries
Summary:
This introduces a few things:

- It enables us to create Caffe2Config.cmake that can be used down the road for building dependent libraries, so they do not need to explicitly write FindCaffe2.cmake.
- The config file will automatically figure out transitive dependency of Caffe2 as well as compiler flags.
- This diff also disables the RPATH setting since it is kind of a mess right now. In principle, we should figure out a clearer rpath setting following the typical rpath setting choices (https://cmake.org/Wiki/CMake_RPATH_handling) - I can send a follow up PR to clean this up.
- Minor: removed old gflags ang glog files.
Closes https://github.com/caffe2/caffe2/pull/1354

Reviewed By: dzhulgakov

Differential Revision: D6098014

Pulled By: Yangqing

fbshipit-source-id: cb06c41a7ef60fddb78b24887b6b3e82684b7c6b
2017-10-19 10:05:32 -07:00
f9ee52efa9 Update DLPack bindings 2017-10-19 10:06:53 -04:00
76071cfbac Merge commit '99cbf24b8b5a9d769b9794e447e0b740bcdd99c8' 2017-10-19 10:06:41 -04:00
99cbf24b8b Update log_softmax and softmax signatures to include dim (#106) 2017-10-19 12:19:20 +02:00
8d8cebd6be Fixes the net-rewriting pipeline for model with rowwise adagrad
Summary: Model with rowwise RMSProp does not work in net-rewriting pipeline (fbl 29841194). This diff solves the issue by changing the way Slice op is used in the model and adds a rule to `parallelize.py` to cover for needed cases.

Reviewed By: azzolini

Differential Revision: D6096022

fbshipit-source-id: c4f615b2ba99da9f77a1d49c9fb898e0e59401f8
2017-10-18 20:05:37 -07:00
03bfd7a873 In Predictor interface allow real model inputs to be fed in run* functions
Summary: https://github.com/caffe2/caffe2/issues/1294

Reviewed By: jerryzh168

Differential Revision: D6086990

fbshipit-source-id: 7d21269055d91cc223a72f6352cdb45584f5b56b
2017-10-18 20:05:36 -07:00
43b303bfc0 Expose Predictor::run_map to Python
Reviewed By: jerryzh168

Differential Revision: D6087316

fbshipit-source-id: d90e20429645391f17f0c56c8a8a60685097f801
2017-10-18 19:32:56 -07:00
96c6212513 repeat sequence mask for data dims
Summary: Allow the application of sequence-length masking to be replicated along one or more minor axes. See task for details.

Reviewed By: jamesr66a

Differential Revision: D6090835

fbshipit-source-id: 9064232aa9b93246c582b6e0bae73be5dbe09e98
2017-10-18 18:08:08 -07:00
7fd6fd6d80 Output more useful error message when exporting FeatureDropout in train mode (#3156)
* Output more useful error message when exporting FeatureDropout in train mode

* Update the comment
2017-10-18 20:27:18 -04:00
3631cd71b1 nit: move ATenDLMTensor to cpp file since it doesn't need to be in the header 2017-10-18 16:29:14 -07:00
424390bc96 [dlpack] Memory management for dlpack 2017-10-18 16:25:30 -07:00
9eb9615a6b fix build error when nnpack is enabled (#3167) 2017-10-18 17:50:31 -04:00
57ffe64cbe Embedding related fixes (#3128)
* Fix docs for nn.Embedding and F.embedding.
  - add description of 'sparse' argument (#3104)
  - fix F.embedding example (resulted in RuntimeError)
* Make EmbeddingBag a New Style Function.
* Add a functional interface for EmbeddingBag
* Fix failing tests: add max_norm and norm_type to context,
and fix typo in backend call.
* Docfix: remove torch.manual_seed from example code.
* Add a note about using sparse keyword in Embedding function.
2017-10-18 23:38:07 +02:00
9ec9acc0cd Fix bug with 'coalesced' calculation in 'cadd'. (#3162)
Apparently, the algorithm only guarantees the output is coalesced if
the inputs are coalesced.

I'm planning to do another PR that does much more stringent correctness
testing for the 'coalesced' bit shortly, but y'all should merge
this one first.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-18 23:20:56 +02:00
f8cb5d6437 Trying to construct a Tensor from a Variable fails more appropriately (#3163) 2017-10-18 23:13:22 +02:00
51c2075f16 Relax Scalar::toXXX conversions to only check for overflow
Currently, the toXXX functions on Scalar check that the conversions are
exact. This will cause an exception in code like:

  auto t = CPU(kFloat).ones({1});
  t *= M_PI;

Or the equivalent in Python:

  t = torch.ones(1)
  t *= math.pi

This changes the checks to only throw an exception in the case of
overflow (positive or negative).
2017-10-18 13:53:09 -07:00
4348c9c8f8 Disable logging by default
Summary: By default, do not log anything to reduce the runtime overhead

Reviewed By: Maratyszcza

Differential Revision: D6082490

fbshipit-source-id: 35fd09ea439925139d66b4623211e01af46e18f2
2017-10-18 12:54:28 -07:00
dcb457fdd9 add support for using nnpack when installed via conda (#3155)
* add support for using nnpack when installed via conda

* unify nnpack discovery between conda and user
2017-10-18 20:11:13 +02:00
7680601659 Spatial Depthwise Convolution on the GPU (#3057)
* THCUNN Skeleton for Depthwise Convolution port

* implement Depthwise Convolution CUDA Kernels (handles weight parameter only, not bias)

* working kernels and bindings for forward + backward for base conv, and integration

* add support for padding

* strides for weight kernel

* dilation for weight gradient, enable for others

* add support for depthwise multiplier

* remove old depthwise conv

* rename to SpatialDepthwiseConvolution

* clean up depthwise code, add shape asserts, more constrained thread count for accgradparams

* add bias for forward for depthwise conv

* add grad_bias, move bias for forward to CUDA

* fix eligibility test to guard against transposed, properly identify depth multiplier

* add basic unit test; make depthwise conv take priority over cudnn when appropriate

* add tests for depthwise permutations

* make cuda kernels calculate positions using mul instead of div

* remove unnecessary samegpu requirement

* use accreal, test for double type

* use THAssert instead of assert

* rename to is_depthwise

* half prec support for depthwise

* make certain computation more pythonic

* flake8
2017-10-18 14:16:02 +02:00
95556f4075 add ignored_keys param to load_state_dict (#3159)
* add ignored_keys param to load_state_dict

* remove ignored_keys in favour of a strict param

* raise KeyError only if strict is enables
2017-10-18 14:14:19 +02:00
23a3f78988 Reverse the order of checks in torch.gather (#3130)
* Reverse the order of checks in torch.gather

* Remove unnecessary comment

* Add missing check for indexing dimension
2017-10-18 12:30:05 +02:00
6647475bc2 Lazily create Variable.data PyObject* (#3149)
Previously, we the Variable.data PyObject* in THPVariable_Wrap. For many
Variables, we don't access their data directly. Instead, they are passed
from one Variable compuatation to another.

This reduces the overhead of ATen-implemented Variable methods by
~200ns.
2017-10-17 11:54:55 -04:00
75bb50be0a Remove THHeapUpdate (#3143) 2017-10-17 11:07:40 +02:00
3109e4ad6a add common terminology to BatchNorm docs 2017-10-17 11:03:31 +02:00
f176c864f0 minor autograd reference change in readme (#3144) 2017-10-17 10:16:06 +02:00
923bcfdd27 Gate engine=NNPACK with nnp_initialize
Summary:
Somehow we're observing mysterious test failures for some nnpack-related tests with gcc5 only on Travis: https://travis-ci.org/caffe2/caffe2/jobs/288804879

Marat suggested that maybe the machine doesn't have avx2 support.

Right now gating is happening for FB-internal only. I think it makes sense to make gating generic. Calling `nnp_initialize` seems like the right way to do so. It returns failure if the hardware is not supported and is a noop after the first call.

Reviewed By: Maratyszcza

Differential Revision: D6073808

fbshipit-source-id: e684668628b5c635368351114b6c502d2cc81fe4
2017-10-16 20:17:39 -07:00
4ac8ecb76e Some bug-fixs in mpscnn backend
Summary: att

Reviewed By: ajtulloch

Differential Revision: D6037723

fbshipit-source-id: d7405b27089210abfd48a33ecee47a87f67ae9a0
2017-10-16 18:33:28 -07:00
6ac393a32b WeightedSigmoidCrossEntropyWithLogits
Summary:
Op for computing SigmoidCrossEntropyWithLogits with per-label, per-sample weights. Can be used for addressing class or label imbalance.

Doc:
Given three matrices: logits, targets, weights, all of the same shape,
(batch_size, num_classes), computes the weighted sigmoid cross entropy between
logits and targets. Specifically, at each position r,c, this computes
weights[r, c] * crossentropy(sigmoid(logits[r, c]), targets[r, c]), and then
averages over each row.
Returns a tensor of shape (batch_size,) of losses for each example.

Reviewed By: stephenyan1231

Differential Revision: D5997723

fbshipit-source-id: f3172325f1c98b6f26e1700131ef897b743a72fc
2017-10-16 17:34:38 -07:00
9d4d0640f2 Support MNIST in ONNX (#3100)
* Support MNIST in ONNX

* Add train mode check in FeatureDropout symbolic, add todo mark in logsoftmax_symbolic

* export FeatureDropout as a simple identity op

* turn x = x or y to if-checks.
2017-10-16 19:51:40 -04:00
58bcf76ba3 Have model downloading as a separate plan
Summary:
For distributed offline training, downloading parameters from trainer_0 is part of epoch plan. However for distributed realtime training, we publish model by a specific time interval, so we need run multiple iterations for epoch plan before publishing the model.

In this diff, I split downloading parameters from epoch plan as a separate plan, so we can explicitly execute it before model publishing for distributed online training.

Reviewed By: boryiingsu

Differential Revision: D5995122

fbshipit-source-id: 47d61d7b8c57cfae156e79b7ec32068ef579d7c3
2017-10-16 16:03:48 -07:00
fce3ed19e5 Change device_id to device in python land (#3133)
* change device_id to device in python land

* cuda/random.py
2017-10-17 00:54:26 +02:00
ba05dc5549 dense buffer (#3139) 2017-10-17 00:51:37 +02:00
17d68f824d Fix typo. (#3140) 2017-10-17 00:50:33 +02:00
569bdb4b77 Refactor executor test
Summary:
Travis treats test_settings/test_model_names as tests, moving them into
executor_test_util

Reviewed By: bddppq

Differential Revision: D6068920

fbshipit-source-id: 01c5bf962b985398414f44a7849c0f6344fd7e1d
2017-10-16 15:17:16 -07:00
3261e1337a Use 0D (1-element) tensor instead of 1D tensor 2017-10-16 17:47:36 -04:00
00996006d1 Remove type inference from value 2017-10-16 17:47:36 -04:00
93e1749c85 Add ONNX support for AddConstant and SubConstant 2017-10-16 17:47:36 -04:00
da7aa3a12f Add helper function _constant in onnx.py 2017-10-16 17:47:36 -04:00
7d16d320d5 expose observers to python, add multiple observers per observable
Summary: observer framework can now be used in python + a small writeup of how to use it.  this is D6035393 with a fix for ct-scan

Reviewed By: salexspb

Differential Revision: D6066380

fbshipit-source-id: 896c4c580d4387240b81ac2dbbc43db51d4bfeb9
2017-10-16 14:32:56 -07:00
e92246fffa Visit hooks in C++ implemented autograd functions (#3138)
Once mul uses ATen, this is necessary for TestAutograd.test_hooks_cycle
to pass.
2017-10-16 17:30:09 -04:00
36895e2dd2 update the comments, move the expect check logic into the helper function 2017-10-16 16:57:16 -04:00
a1deb2d47f Move the exception logic to the helper function 2017-10-16 16:57:16 -04:00
cad9438bb9 Add unit tests for onnx helper functions 2017-10-16 16:57:16 -04:00
1735c5f6c7 Add Filler op for double
Summary: Closes https://github.com/caffe2/caffe2/pull/1344

Reviewed By: dzhulgakov

Differential Revision: D6065137

Pulled By: bddppq

fbshipit-source-id: 1849beeaa4fee8cc056b685664f91daca71764b8
2017-10-16 13:48:15 -07:00
f6f51129ce Fix SparseToDenseMask for int64 indices
Summary: that what made tests fail :)

Reviewed By: xianjiec

Differential Revision: D6067037

fbshipit-source-id: 0194f082feed87b0502170683c6773e07db3ff44
2017-10-16 13:17:31 -07:00
3c144e3872 Relax CopyToMPSCNN dimension requirement
Summary: Enable CopyToMPSCNN to accept 1 <= ndim <= 4.

Reviewed By: ajtulloch

Differential Revision: D6021320

fbshipit-source-id: e76222b41a0c7b19b38df2ef8be5a4bb24843419
2017-10-16 12:18:05 -07:00
0f4ae13f05 Better cudnn version checking (#3132) 2017-10-16 20:59:18 +02:00
47beb64b5c Use ATen generator as default CPU generator (#3135)
ATen has it's own default CPU RNG. Use this as the default in PyTorch so
that random functions called through ATen have the same behavior as
random functions called through TensorMethods
2017-10-16 14:22:58 -04:00
0c8aaabce8 disable share dir by default
Summary: until we have an internal build test for this directory we should not have enabled by default in open source

Reviewed By: salexspb

Differential Revision: D6060577

fbshipit-source-id: 25f5c2d30adf274620cd8ec2e2db9565b98cfa7c
2017-10-16 11:21:20 -07:00
28ed514bfe Add additional resizes to ClassNLLCriterion (#3134) 2017-10-16 12:30:45 -04:00
c0c3162c1a Support NVIDIA Tegra
Summary:
makes the necessary changes to support Caffe2 OpenGL ES backend on NVIDIA Tegra devices
- Remove no_bounds global because Tegra GLES driver doesn't recognize it as a constant. Define BOUNDS_CHECK_MODE macro instead.
- Recognize "NVIDIA Tegra" as a supported GL_RENDERER

Reviewed By: hlu1

Differential Revision: D6030760

fbshipit-source-id: e3655467612469d69c70b3fee35edb2d6774a793
2017-10-15 10:18:52 -07:00
a0ac72e84e Use template instead of sphinx-contrib for google analytics 2017-10-15 18:40:05 +02:00
a7a81351f2 Revert D6035393: [caffe2] expose observers to python, add multiple observers per observable
Summary:
This reverts commit 4563cf0203095fa979bb2160621cd16dd22ff830

bypass-lint

Differential Revision: D6035393

fbshipit-source-id: 090fba774ce433904f7ef769dda75c2fbbf784a8
2017-10-14 21:47:34 -07:00
58fe66e337 expose observers to python, add multiple observers per observable
Summary: observer framework can now be used in python + a small writeup of how to use it

Reviewed By: sf-wind

Differential Revision: D6035393

fbshipit-source-id: 4563cf0203095fa979bb2160621cd16dd22ff830
2017-10-14 13:09:29 -07:00
490d5c2f13 improve torch.load documentation (#3118) 2017-10-14 18:54:53 +02:00
75665ca6db Suggest NO_CUDNN=1 as alternative when CuDNN is too old.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-14 12:04:40 -04:00
f709199c49 Make test_jit more robust about compilation.
It's pretty easy to accidentally fail to actually compile
a JITed region, which means that we have accidentally failed
to have test coverage for a number of features.  This adds
a secret _assert_compiled kwarg, which will raise an error
if we don't actually hit the compiled codepath.

This is not intended to be user visible; we have some other
ideas for handle this case.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-14 12:04:40 -04:00
6dc67aef17 doc (#3110) 2017-10-14 10:44:35 +02:00
38f87cc9c4 Limit print scale by sys.float_info (#3113)
* limit print scale by sys.float_info

* test print tiny/huge values in test_print

* fix lint
2017-10-14 08:52:01 +02:00
f11fb319bd Fixes for new ATen
- names of convolution and batch normalization functions changed
 - at::Type copy now uses broadcasting
 - at::Type storageFromBlob takes a deleter
2017-10-13 22:04:21 -07:00
5a195d9dc6 Merge commit '88b5bf8ec08f19c817019fa229eef0b1c6c92431' 2017-10-13 22:04:17 -07:00
864bd934b0 Add a helper function to check broadcasting (#3115) 2017-10-13 23:22:16 -04:00
4f81eff2eb Perform gradient checks on masked_scatter and masked_fill
We weren't doing gradient checks on these functions because the tests
were in-place only. We also incorrectly classified __magic__ functions
as inplace.
2017-10-14 00:02:22 +02:00
8666be05f5 Raise runtime error in setup.py if cudnn version is not supported 2017-10-13 23:58:25 +02:00
1322f9a272 Add cudnn version to torch.version 2017-10-13 23:58:25 +02:00
123cb5dd07 use non-cudnn transpose for int tensors
Summary: Turns out CuDNN's tensor transform only supports floats. Previous implementation pretended it would work with ints by casting to floats and indeed passed tests for some reason. But rgirdhar found a case where it returned nonsensical results. So rewire int-transposes to use non-cudnn version. Had to refactor a bit for that. Also added a test for the case.

Reviewed By: asaadaldien

Differential Revision: D6043284

fbshipit-source-id: cc3b14f9fbbdeff421b01da453a1d3c7c5ffd4ac
2017-10-13 14:02:48 -07:00
88b5bf8ec0 Every argument controlled by the output_mask may be null 2017-10-13 14:01:05 -07:00
4c3b02f314 Enable Flatten operator to take an arbitrary axis arguemnt
Summary:
input dimensions up to "axis" will be flattened to the outer dim of output and the remaining input dims will be the inner dim
Closes https://github.com/caffe2/caffe2/pull/1330

Reviewed By: dzhulgakov

Differential Revision: D6039560

Pulled By: bddppq

fbshipit-source-id: e92c30b49a9288feeefc4a639522406e97e149e1
2017-10-13 12:28:22 -07:00
c3a9423c7f Fix: ClearField only accepts string as field name
Summary: Closes https://github.com/caffe2/caffe2/pull/1336

Reviewed By: akyrola

Differential Revision: D6050556

Pulled By: bddppq

fbshipit-source-id: 19809912ff0a1252d6054372debd6d77eea917a6
2017-10-13 10:57:34 -07:00
8cfb23529b Add additional erf, erfinv, and additional nn functions 2017-10-13 10:39:26 -07:00
9b5371df1c Add bindings to additional NN functions 2017-10-13 10:39:26 -07:00
f444bd72b2 Don't free interned Python strings held in global variables (#3107)
This pulls in @gchanan's fix for some crashes on exit in PyArgParser
from #2997
2017-10-13 18:31:03 +02:00
5a96037810 skip ncclCommDestroy if CUDA driver is already unloaded 2017-10-13 08:50:00 -07:00
8f26d6aabc More shape checking for ConvNd (#3052)
* check conv weight & bias dims

* address comments
2017-10-13 16:56:19 +02:00
4831e478e1 Expose cmake version as env variable and scipy test 2017-10-13 16:54:35 +02:00
4c6c4c513a fix grad_bias calculation for nnpack 2017-10-13 16:16:31 +02:00
dd494091b2 remove std::move in profiler 2017-10-13 07:14:07 -07:00
cb011410b8 fix warning in THD 2017-10-13 04:03:11 -07:00
5f5270d4bf raise AttributeError from __getattr__ for hasattr to work
Summary:
- hasattr is misbehaving in python 3
- python2: `This is implemented by calling getattr(object, name) and seeing whether it raises an exception or not`
- python3: `This is implemented by calling getattr(object, name) and seeing whether it raises an AttributeError or not.`

Reviewed By: azzolini

Differential Revision: D5973797

fbshipit-source-id: 0b6a413e6ebacd9bdd197c46feab256ab383ace2
2017-10-12 23:25:15 -07:00
2972a6ca02 Revert D6026557: [caffe2][PR] Fix "No handlers could be found for logger"
Summary:
This reverts commit 95c634872ac02be721257169e38c8fead04cd66b

bypass-lint

Differential Revision: D6026557

fbshipit-source-id: 663c28583ce3b01070ff5449115ed7e222f71776
2017-10-12 20:21:52 -07:00
4b12d9d1b2 Expose is_nullable in Declarations.yaml
Some parameters can be null but do not have default values.
2017-10-12 19:53:29 -07:00
3366654fd4 Support broadcasting in copy
a.copy_(b) will now broadcast b to the shape of a. Note that this means
that copies between tensors of the same number of elements but
incompatible shapes are not allowed. For example, the following will
throw an exception:

  Tensor a = type.rand({4, 43);
  Tensor e = type.rand({3, 4});
  a.copy_(e)
2017-10-12 19:52:43 -07:00
de7e1a9a82 Use pointer equality to compare types 2017-10-12 19:50:24 -07:00
3bc94bf02c Combine comparison methods and functions
The methods were separate because PyTorch supports multiple output types
for comparison methods. For example, for FloatTensors 'a' and 'b' both
calls are vaild:

   torch.lt(a, b, out=<ByteTensor>)
   torch.lt(a, b, out=<FloatTensor>)

ATen only supports ByteTensor outputs because the overloads have the
same static signature and would conflict. It would be nice to fix this
in the future like with the bernoulli function.

In the meantime, the separate function and method definitions with
different argument names make implementing VariableType more difficult.
2017-10-12 19:49:54 -07:00
92c9848c04 Support wrap_dim in nn.yaml 2017-10-12 19:45:15 -07:00
5d689989ec Expose the THGenerator* via unsafeGetTH on at::Generator 2017-10-12 19:43:17 -07:00
998a1b6d74 fix memonger after D5994548
Summary: memonger.cc's support for RNNs was broken in D5994548, because it changed a .n argument to .s argument. That made data_parallel_model_test fail (but tests were not run for the blame diff, so this was not noticed).

Reviewed By: kennyhorror

Differential Revision: D6043948

fbshipit-source-id: d29abd6927c519227a28b41c1ef70fb1756904bf
2017-10-12 17:36:27 -07:00
d748c43f71 for dpm.GetLearningRateBlobNames
Summary:
I broke dpm.GetLearningRateBlobNames() when adding a new nodename param in optimizer.
Fixing it.

Reviewed By: asaadaldien

Differential Revision: D6043828

fbshipit-source-id: b3a79dd0dfae144187bcb359e2374eab6b32c485
2017-10-12 17:20:33 -07:00
2675ff73fd Resize output argument total_weight in ClassNLLCriterion 2017-10-13 01:32:07 +02:00
61bb0d2954 Remove unused parameter 'input' from Tanh 2017-10-13 01:31:48 +02:00
66bb3d6dec Remove incorrect comment that join_with is symmetric.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-13 01:31:22 +02:00
191224b6e6 Suggest key_averages by default, it's more useful.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-13 01:31:22 +02:00
94c1fdd254 Typofix
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-13 01:31:22 +02:00
86c1842701 More detailed docs for Graph.op
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-13 01:31:22 +02:00
b9cd45adcf Add note about inplace status in ONNX and JIT.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-13 01:31:22 +02:00
2c01afd2a6 DoOp reuse workspace and test
Summary: Adding ability to reuse workspace in Do op and unit tests

Reviewed By: akyrola

Differential Revision: D6037992

fbshipit-source-id: 73d6a14001f667f7ca5e1e02ff39911dc65e4cd1
2017-10-12 13:37:34 -07:00
9575364d30 Update protobuf detection
Summary:
The scripts/build_local.sh script would always build protoc from the
third_party protobuf tree and override the PROTOBUF_PROTOC_EXECUTABLE
CMake variable. This variable is used by the protobuf CMake files, so
it doesn't let us detect whether the protoc was specified by the user
or by the protobuf CMake files (e.g. an existing installation). This
in turn led to a problem where system installed headers would be
picked up while using protoc built from third_party. This only works
if the system installed version matches the version included in the
Caffe2 tree. Therefore, this commit changes the variable to specify a
custom protoc executable to CAFFE2_CUSTOM_PROTOC_EXECUTABLE, and
forces the use of the bundled libprotobuf when it is specified.

The result is that we now EITHER specify a custom protoc (as required
for cross-compilation where protoc must be compiled for the host and
libprotobuf for the target architecture) and use libprotobuf from the
Caffe2 tree, OR use system protobuf.

If system protobuf cannot be found, we fall back to building protoc
and libprotobuf in tree and packaging it as part of the Caffe2 build
artifacts.
Closes https://github.com/caffe2/caffe2/pull/1328

Differential Revision: D6032836

Pulled By: pietern

fbshipit-source-id: b75f8dd88412f02c947dc81ca43f7b2788da51e5
2017-10-12 11:48:50 -07:00
1e8a16224f PackSegments: return value presence.
Summary:
Optionally return a blob of shape [batch size, max length] that is
false only in locations where the output tensor was padded.
One can separately convert lengths to segment ids and cast, but
this is more convenient, and possibly more efficient.

Differential Revision: D6006073

fbshipit-source-id: af6c4ea31972566e7d059dcd3fdd8afba97a88e9
2017-10-12 11:17:34 -07:00
c6f96c1d7b Add GPU support for LengthsTile
Reviewed By: kittipatv

Differential Revision: D5999171

fbshipit-source-id: cd0e305488f05c20d1925745fca0c4b4eef23071
2017-10-12 11:17:34 -07:00
7cf4529a82 add a deleter callback to tensorFromBlob 2017-10-12 11:06:40 -07:00
ca392b7c76 remove timeout from RNN executor
Summary:
I had a 30 sec timeout in RNN executor to find out deadlock bugs, but looks like people are occasionally bumping on it in the course of normal business -- perhaps when CPU is heavily used, the threads don't get enough time and run out of the timeout.

Removed the timeout but retain the warning logging.

Reviewed By: salexspb

Differential Revision: D6001960

fbshipit-source-id: 5b2293359ee68c1c24f0d9e0406d88391e531280
2017-10-12 10:59:41 -07:00
6b22f64d2c fix_im2col_nd_gpu_kernel
Summary:
Im2colNd GPU version  was not correctly implemented due to 1) the lack of unit test 2) it is actually NOT used by any use case.

A little more background: We are working implementing a conv-deconv 3D operator, which takes 3D volume data (e.g. video) as input, do conv in spatial domain to reduce resolution and do deconv (a.k.a conv transpose) in temporal domain. We first implement a conv transpose 3D op in D6035108, and spot the buggy gpu implementation.

Reviewed By: asaadaldien

Differential Revision: D6035081

fbshipit-source-id: b76dea2e44bcb73d202441bb246249c4481973e1
2017-10-12 10:05:27 -07:00
f964105b56 Update generated ffi wrapper to consider other variable types (#3087) 2017-10-12 18:54:31 +02:00
4908351212 Do not propagate gradients for GatherRangesToDense
Summary: as title

Differential Revision: D5997854

fbshipit-source-id: a4f1cadfbd8057b01517e49f23f61b2029fa6099
2017-10-11 22:33:11 -07:00
9ef39a50ee Fix the broadcast in Addmm's symbolic (#3063)
* Fix the broadcast in Addmm's symbolic

* fix the non-matching dimension cases

* Add exception for non-supported case, remove onnx test cases (moved to onnx-pytorch repo)

* remove the test_onnx.py in run_test.sh

* lint the code
2017-10-11 22:23:11 -04:00
14c1e19c73 Consolidate the observer implementation
Reviewed By: bwasti

Differential Revision: D6013605

fbshipit-source-id: 661e11c2e35c4ecaaf6e2fdc67c44a75859c6b36
2017-10-11 18:53:36 -07:00
bfae95043d Self register the observer reporter when the file is included in the source
Summary:
This way, we can choose to include a file and the the containing reporter is registered in the ObserverConfig. We can have different targets with different reporters without exposing the dependency to all clients.
Closes https://github.com/caffe2/caffe2/pull/1320

Reviewed By: bwasti

Differential Revision: D6024096

Pulled By: sf-wind

fbshipit-source-id: c6eabd7f9ca51b88ea4b268612355ca60809c0a2
2017-10-11 18:53:31 -07:00
790941d6a0 Add additional comments 2017-10-11 18:28:07 -07:00
8d19116508 Generate PyTorch-style NN bindings
This generates NN bindings with a similar interface to PyTorch's
torch.nn.functional package. The file nn.yaml specifies function
signatures and THNN implementations.

Each NN operation generates three functions. For example:

  - conv2d
  - conv2d_forward
  - conv2d_backward

The conv2d and conv2d_forward functions differ in how they handle
buffers that need to be passed to the backward function. conv2d_forward
takes the buffers as parameters. conv2d creates the buffers internally
and discards them.
2017-10-11 18:28:07 -07:00
7bc154f8ea Remove unused argument 'input' to Sigmoid_updateGradInput (#3079) 2017-10-11 23:52:50 +02:00
23c4152b41 Resize outputs in criterions (#3074)
Most NN functions size their outputs appropriately. This makes the
criterions used in PyTorch consistent with the other NN functions.
2017-10-11 23:52:31 +02:00
2000ba0b26 Add random_ for cuda, fix random_ for cpu (#3042) 2017-10-11 23:45:17 +02:00
5b10ad255b Use EMBEDDING feature type instead of FLOAT_TENSOR
Summary: create a special type for embeddings

Differential Revision: D5997808

fbshipit-source-id: 9a5ad8ecc019d10536705d3b25f2436ca8a56454
2017-10-11 13:50:03 -07:00
3e9f0092eb Remove Redundant CMAKE_BUILD_TYPE
Summary: Closes https://github.com/caffe2/caffe2/pull/1323

Differential Revision: D6031534

Pulled By: Yangqing

fbshipit-source-id: de75523b17f67d092d45edb91fbb4e83c67b04be
2017-10-11 12:49:24 -07:00
57863e4e79 Remove CAFFE2_CPU_FLAGS
Summary:
Since this is only a duplicate of CMAKE_CXX_FLAGS we should simplify the set of options.
Closes https://github.com/caffe2/caffe2/pull/1327

Differential Revision: D6031544

Pulled By: Yangqing

fbshipit-source-id: 5c610a70118089b4d96be30ab028ef1d5efdb019
2017-10-11 12:49:23 -07:00
c97e78715d Revert D6028262: [caffe2][fix] update observer api in perf_observer
Summary:
This reverts commit 3dd99649473b9fe30493aa9306907e05b434d0d4

bypass-lint

Differential Revision: D6028262

fbshipit-source-id: 7fed6e0948a4199b429fbd28cfcc1ae9ff0c145a
2017-10-11 11:21:38 -07:00
cc3058bdac Fix macOS build (with CUDA) (#3071) 2017-10-11 19:04:15 +02:00
bd9b4df6e9 Add support for exporting MulConstant, DivConstant and Softmax to ONNX (#2923)
* Add support for exporting MulConstant and Softmax

* Add support for MulConstant in autograd execution

* Also add support for DivConstant
2017-10-11 13:03:33 -04:00
9260f0e5ee Fix a typo in optim.rst (#3069) 2017-10-11 16:47:14 +02:00
72f6b5a03b Make DtoH and HtoD transfers respect the current stream (#3067) 2017-10-11 09:49:14 -04:00
4fb7600fcb update observer api in perf_observer
Summary: fix observer API

Reviewed By: sf-wind

Differential Revision: D6028262

fbshipit-source-id: 3dd99649473b9fe30493aa9306907e05b434d0d4
2017-10-11 00:16:53 -07:00
25b35a3f62 Fix broken MPI tests
Summary:
Broken since e16871d87d06f3ae1adfc90bd43410c00cc4a330
Closes https://github.com/caffe2/caffe2/pull/1315

Differential Revision: D6026591

Pulled By: Yangqing

fbshipit-source-id: 0569128bb4df6c912d5d00239f6d70cdb72d3a15
2017-10-10 22:32:14 -07:00
75bece6ede Fix "No handlers could be found for logger"
Summary: Closes https://github.com/caffe2/caffe2/pull/1316

Differential Revision: D6026557

Pulled By: Yangqing

fbshipit-source-id: 95c634872ac02be721257169e38c8fead04cd66b
2017-10-10 22:32:13 -07:00
b1508e8e86 Revert D5905002: [caffe2] expose observers to python
Summary:
This reverts commit e40ec24a55e08fb73beea9b4f3b68e71fc66ffb1

bypass-lint

Differential Revision: D5905002

fbshipit-source-id: 4f1b79d9a318978f6b74565f633f34b9701a9d5c
2017-10-10 22:12:00 -07:00
e13f199452 Switch RNNOp to use NetDef argument for step represenetation.
Summary: Before this diff RNNOp was using TextFormat for representing steps. This diff is changing RNNOp to prefer NetDef argument instead. To be backward compatible it supports TextFormat for existing models, though we can compile RNNs without TextFormat as well.

Reviewed By: salexspb

Differential Revision: D5949330

fbshipit-source-id: 9336a8f5ccf30ad8d8e3a7067b9437e1704b1c9f
2017-10-10 22:01:51 -07:00
169ed0cd4b remove torchvision docs from pytorch repo. Moved to vision repo (#3024) 2017-10-10 23:59:55 -04:00
828048f578 Add document on how Module.cuda() and optims should work together (#3056) 2017-10-10 22:55:23 -04:00
f2809a5259 Fix Python lint. (#3061)
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-10-10 22:44:33 -04:00
1dbbef6b48 Fix crash in blob deallocation
Summary: We have to use copy constructor in Concat when copying non-primitive types

Reviewed By: Yangqing

Differential Revision: D6002883

fbshipit-source-id: 0aebc955079975bb6423291589ed09ce0660acf3
2017-10-10 19:03:01 -07:00
66b8cb95e9 Add int64 support to sparse_to_dense_mask_op
Summary: [CAFFE2] Add int64 support to sparse_to_dense_mask_op

Reviewed By: ender-wieczorek

Differential Revision: D6022278

fbshipit-source-id: 489b6df4d43a64c743ee278d94929ca50259f7b8
2017-10-10 17:19:44 -07:00
b4dfadcfa2 Fix OOM in Travis in executor test
Summary: Use only MLP model and re-enable test

Reviewed By: bddppq, Yangqing

Differential Revision: D6013471

fbshipit-source-id: 0cb4a9346c62a739ee6259832181f71e60eef311
2017-10-10 17:19:43 -07:00
18790639ed Rename library name to lower
Summary:
In the past we call our libraries libCaffe2_CPU.so and libCaffe2_GPU.so that don't really match the usual linux so library naming conventions. This diff changes it to libcaffe2.so (old Caffe2_CPU) and libcaffe2_gpu.so (old Caffe2_GPU).

This might affect existing building scripts that explicitly use Caffe2_CPU and Caffe2_GPU: what do you guys think? pietern bwasti slayton58
Closes https://github.com/caffe2/caffe2/pull/1300

Differential Revision: D6025973

Pulled By: Yangqing

fbshipit-source-id: 6243de4e7af8924f737bb74f3936015f4c91fa26
2017-10-10 17:02:21 -07:00
1b892ea295 Enable axis argument for MatmulOp
Summary: att

Reviewed By: ajtulloch

Differential Revision: D5523365

fbshipit-source-id: b7a379c9c4326cd642e7b4768cc590b5e1b94b6d
2017-10-10 16:47:37 -07:00
09d6b6fd00 update intel script
Summary:
TSIA - this would allow us to auto-sync the up to date version with intel's repo.
Closes https://github.com/caffe2/caffe2/pull/1319

Reviewed By: pietern

Differential Revision: D6023739

Pulled By: Yangqing

fbshipit-source-id: 79bd91aa3a193c266acccdeb682519a49e028bae
2017-10-10 16:41:07 -07:00
63caca89db expose observers to python
Summary: observer framework can now be used in python + a small writeup of how to use it

Reviewed By: salexspb

Differential Revision: D5905002

fbshipit-source-id: e40ec24a55e08fb73beea9b4f3b68e71fc66ffb1
2017-10-10 16:10:41 -07:00
246a382610 Simplify PReLU binding (#3055)
* Simplify PReLU binding

 - Remove internal buffers from function signature
 - Compute nOutputPlane internally

* Fix legacy PReLU
2017-10-10 17:50:13 -04:00
f74665f0c4 remove gcc install suggestion 2017-10-10 14:45:19 -07:00
d66549d27c remove files from botched merge 2017-10-10 14:42:41 -07:00
f11ff5befb Fix mismatched input shape in ATen sample script
Reviewed By: akyrola

Differential Revision: D6023015

fbshipit-source-id: b210e9e8f213d416abf9c9ddbb28bca3bd35c512
2017-10-10 14:21:08 -07:00
8d8a99c244 Add ONNX Pad reflect and edge mode support (#3048) 2017-10-10 17:02:08 -04:00
9437644f66 Replace softmin and softsign with simple differentiable expressions 2017-10-10 16:57:47 -04:00
e3a7c78f04 Add shutdown_fun to parallel_workers
Summary:
parallel_workers supports calling a custom function "init_fun" when WorkerCoordinators are started which is passed in as an argument to init_workers.

Adding an analogous argument "shutdown_fun" which gets passed in to init_workers, and gets called when a WorkerCoordinator is stopped.

This allows users of the parallel_workers to add custom cleanup logic before the workers are stopped.

Reviewed By: akyrola

Differential Revision: D6020788

fbshipit-source-id: 1e1d8536a304a35fc9553407727da36446c668a3
2017-10-10 12:02:24 -07:00
ee143d31ef Fix ImageInput op in resnet50_trainer.py
Summary:
Fix #1269 (from fa0fcd4053dd42a4ec3a2a12085662179f0e11df).
Closes https://github.com/caffe2/caffe2/pull/1314

Reviewed By: bwasti

Differential Revision: D6021171

Pulled By: bddppq

fbshipit-source-id: 7d7c45f8b997c25f34530f826729d700a9c522d4
2017-10-10 11:20:52 -07:00
d894a6362f Add missing is_test argument in ImageInput ops
Summary: reported in Github Issue https://github.com/caffe2/caffe2/issues/1269

Reviewed By: salexspb

Differential Revision: D6004461

fbshipit-source-id: 03f4bccfe085010b30109ab7b6fe7325caa160ef
2017-10-10 10:03:13 -07:00
c23ae308f3 Fix build without numpy (#3049) 2017-10-10 18:47:19 +02:00
f7f37306e4 New torch.jit.verify function for verify once-backward.
A few notes about the implementation:
- Need to plumb 'devices' through to the 'fork_rng' calls.  You definitely
  want these; it makes verify run A LOT faster
- New keyword argument for compiled model execution, '_force_trace', which
  forces us to retrace a model.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-10 11:46:40 -04:00
6de2929967 fix TH warnings after explicit types changes 2017-10-10 08:34:55 -07:00
2443fcac0b Deterministic cudnn algorithms 2017-10-10 10:53:34 -04:00
403a533827 Forgot to modify a kernel call 2017-10-10 10:18:50 -04:00
8cc258153d Make VolumetricAveragePooling cuda stream-aware 2017-10-10 10:18:50 -04:00
a47948784d add kwargs_only defaults for sorted and largest 2017-10-10 10:18:28 -04:00
9dd872053f Add possibility to fallback to retrieving MAJOR.MINOR 2017-10-10 10:16:14 -04:00
139aaf65d6 Bugfix plus remove other option that depends on the version.txt file
Restrict search for cudart instead of libcudart
2017-10-10 10:16:14 -04:00
f093545919 Add compiled CUDA version in torch.version.cuda 2017-10-10 10:16:14 -04:00
5e01bc7122 add 'at' helper method 2017-10-10 10:14:29 -04:00
b56098b540 Make parameter names consistent
Use the same name for parameters computed in updateOutput which are used
in updateGradInput or accGradParameters
2017-10-10 10:13:54 -04:00
9455eda57b cast distill loss teacher label to float
Summary: it failed for the case when the `prod_prediction` is used as teacher label, which is double, instead of float.

Reviewed By: kittipatv

Differential Revision: D6018163

fbshipit-source-id: cd93fd46996e07c7f762eedbeb67331a4665d4c4
2017-10-10 01:16:07 -07:00
6e12a9c4a4 get around homebrew issue
Summary:
This fixes osx build issues - once those pass I'll merge.
Closes https://github.com/caffe2/caffe2/pull/1310

Differential Revision: D6018394

Pulled By: Yangqing

fbshipit-source-id: 345c74435a78909535fa90e8c908fc06d6dabc36
2017-10-09 23:47:26 -07:00
efe91fb9c1 delete redundant python nccl code 2017-10-09 22:24:18 -04:00
e9dccb3156 implement all_reduce, broadcast, all_gather, reduce_scatter 2017-10-09 22:24:18 -04:00
4d62933529 add initial NCCL C bindings 2017-10-09 22:24:18 -04:00
b7e258f81e link specific versioned System NCCL, rather than generic file 2017-10-09 22:24:18 -04:00
2ff516bf79 Add tutorial describing how to use the ATen Caffe2 operator from PyTorch
Summary:
Also fixes a dependency bug in the cmake file for the ATen Op.
Closes https://github.com/caffe2/caffe2/pull/1309

Differential Revision: D6017166

Pulled By: zdevito

fbshipit-source-id: 3f4d18772f9179367927d4e7a52e51a4580342e9
2017-10-09 19:02:10 -07:00
d5f60b240d Fix distill loss
Summary: The layer should also apply to evaluation as it's needed for feature importance run.

Reviewed By: xianjiec

Differential Revision: D6016125

fbshipit-source-id: e1db1a2eb3d45515e3cdc71b4badaaf738a4afd8
2017-10-09 18:17:31 -07:00
77ae903650 Skip negative indices
Summary: A single negative index can crash the job today.  We want to skip a few of them but not a lot.  If we skip too many then we will force the job to crash.

Reviewed By: kennyhorror

Differential Revision: D6003461

fbshipit-source-id: 7881ed6c2cfa78c7bda90c7aa01e81ca00fd08a6
2017-10-09 16:09:50 -07:00
30dac012e0 change header
Differential Revision: D5887857

fbshipit-source-id: 994002cb1a72d123035667e4b809d6cea1950a5e
2017-10-09 15:41:57 -07:00
803afd58a0 Make MultiLabelMarginCriterion respect the cuda current stream 2017-10-09 14:06:22 -04:00
a0831219cf SqueezeNet ceil_mode not yet supported.
Fixes #2898.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-10-09 11:07:11 -04:00
45c5ac1415 Print net type arguments in net_printer
Summary: This prints the inner net of 'Do' op, for example.

Reviewed By: akyrola

Differential Revision: D6007278

fbshipit-source-id: 459583fe13191b0449982efb7be733c9c01ecf76
2017-10-08 20:02:55 -07:00
6743d59513 Add missing import. Add return to __getstate__ 2017-10-08 11:07:10 -04:00
43adc5ba05 Add nodename to ONE, iteration_mutex etc.
Summary: Similar as with Iter, LR.

Reviewed By: azzolini

Differential Revision: D6005817

fbshipit-source-id: 6d1260791d1acb3df957315eb9156eac183ee25c
2017-10-07 22:06:11 -07:00
463bcd00ea add None check for scope.CurrentDeviceScope()
Summary: add None check for scope.CurrentDeviceScope()

Reviewed By: akyrola

Differential Revision: D6005320

fbshipit-source-id: 05e2515736dcb2bddbb47fa423f892091c4577d7
2017-10-07 17:38:30 -07:00
44a0f6805e fix get_cpu_blob_name()
Summary: add def get_cpu_blob_name(self, base_str) back before D6001124

Reviewed By: akyrola

Differential Revision: D6004994

fbshipit-source-id: 318581d2b2c22878929993160da8edcb7d7a58e6
2017-10-07 11:56:15 -07:00
2aac8f4f82 Add support for NetDef in RNNOp.
Summary:
RNNOp have been using TextFormat for representing nets. This have already cause
some incompatibilites and also pulls huge dependencies for RNN on Mobile. This
diff is adding support for using NetDef arg instead and adds supports for
compiling only this version.

Reviewed By: salexspb

Differential Revision: D5994548

fbshipit-source-id: 6c4ded97b80d7a57ad5a013b79ae917aac777c7d
2017-10-07 04:16:37 -07:00
c62490bf59 Use PyInt in Python 2.7 with small values 2017-10-07 00:41:29 -04:00
f29bcab67e Use Declarations.yaml to generate python bindings 2017-10-07 00:41:29 -04:00
558d26a69e Fix argument indices 2017-10-07 00:41:29 -04:00
dcb8d0f088 Refactor out python binding generation from gen_variable_type.py
- Also includes some prep work for binding NN functions
2017-10-07 00:41:29 -04:00
dc1b4ff74e Fix isContiguousDim (#3011) 2017-10-07 00:40:51 -04:00
c52b3d7524 qr memory leak fix (#3017) 2017-10-07 00:35:05 -04:00
69fb6bee58 Remove the extra fake output in ONNX Concat (#3014) 2017-10-06 22:43:22 -04:00
dcfed49e96 fix multiple issues with multiple PS, learning rates, iter;
Summary: 1. iteration and LR must be node-name specific in optimizer

Reviewed By: azzolini

Differential Revision: D6001124

fbshipit-source-id: 0fa53fb3347e89401f62125865166356ac56796b
2017-10-06 19:21:16 -07:00
aaa74b4929 Fix flaky erfinv autograd test (#3015) 2017-10-06 20:05:47 -04:00
dba92055f3 Update Caffe2 benchmark file to write text output
Summary:
The Caffe2 benchmarking framework can now compare the output of a model with some golden output. In order to do that, and reduce the dependency of the benchmarking framework and caffe2, the output is dumped as text format without any schema.

The output is read in by the benchmarking framework and perform the comparison.
Closes https://github.com/caffe2/caffe2/pull/1301

Reviewed By: bwasti

Differential Revision: D5992836

Pulled By: sf-wind

fbshipit-source-id: f6b403103949f4b9880c8372bbdc36966475a387
2017-10-06 15:50:55 -07:00
0eec332e14 assert reflection padding in range (#3008) 2017-10-06 17:59:01 -04:00
1605566388 Add map input for predictor
Summary: Added TensorMap input for run function in predictor.cc

Reviewed By: bwasti

Differential Revision: D5847103

fbshipit-source-id: cd9755a0491b50adc35177164ffe7a50e73ff80f
2017-10-06 13:32:59 -07:00
2c44a9f9cd Add BatchBucketOneHotOp
Summary:
Input is a matrix tensor. Its first dimension is the batch
size. For each column, bucketize it based on the boundary values and then do
one hot encoding. The `lengths` specifies the number of boundary values for each
column. The final number of buckets is this number plus 1. This would also be
the expanded feature size. `boundaries` specifies all the boundary values.
Note that each bucket is right-inclusive. That is, given boundary values
[b1, b2, b3], the buckets are defined as (-int, b1], (b1, b2], (b2, b3], (b3, inf).
For example

If data = [[2, 3], [4, 1], [2, 5]], lengths = [2, 3],
and boundaries = [0.1, 2.5, 1, 3.1, 4.5], then

output = [[0, 1, 0, 0, 1, 0, 0], [0, 0, 1, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 1]]

Reviewed By: xianjiec

Differential Revision: D5976030

fbshipit-source-id: fd746c20b19bcdf5f769451d804c219ad6463f28
2017-10-06 13:25:12 -07:00
d39e519ce2 Merge commit '18eb4bbdf9563c0620bbc93daa045c2258b63bde' 2017-10-06 12:36:34 -07:00
18eb4bbdf9 Improve Declarations.yaml: (#81)
* Improve Declarations.yaml:

 - translate defaults to C++ values
 - include names of returned values
 - mark keyword-only arguments

* Add comment to translate_default
2017-10-06 15:30:25 -04:00
39a82f3e3f Fix triu/tril (#3007) 2017-10-06 15:28:57 -04:00
85d0bfb6f3 Cuda SparseLabelSplitOp
Summary:
This is a brief introduction to what this op is doing. In the multi-label case,
i.e., each example has more than one label, we want to find out which examples
have values for each label. That is, given a sparse representation in
len = (2,3), ind = (1, 2, 0, 1, 2), val = (10, 20, 5, 8, 15), we want to return
example_id_0 = [1], example_id_1 = [0,1], example_id_2 = [0,1],
value_0 = [5], value_1 = [10,8], value_2 = [20,15].

There are two special things here. 1. The size of each output tensor is unknown until runtime;
2. The ordering in each output tensor should be preserved, e.g., example_id_1 = [0,1] instead of [1,0].

What I am doing now is to get the output size and an offset map (see codes) in cpu and then
launch a kernel to take care of the rest. This requires a copy of O(N) which is really not ideal.

Previously I had an implementation that computes the output size in gpu but when I fill values in
the output tensors it is hard to make sure the ordering will be preserved unless I do a sorting afterwards.

Reviewed By: azzolini

Differential Revision: D5825104

fbshipit-source-id: 4d987cef0247746998ec1d2acc47fc5ed2302722
2017-10-06 12:06:39 -07:00
4362c4de9c Temporarily disable test in Travis
Summary: Temporarily disable executor test in Travis

Reviewed By: akyrola

Differential Revision: D5997441

fbshipit-source-id: 54f454d99a50a917a950dfd23b1e20fb7fbbc754
2017-10-06 12:06:38 -07:00
5e38345d4a Fix break
Differential Revision: D5997998

fbshipit-source-id: a3937539fe331107f4d2917a2e44e187fa14a8c1
2017-10-06 11:34:54 -07:00
d2195218f6 Build local
Summary:
The build_local.sh script current is single thread, which is really slow. Use the same mechanism in build_android.sh to parallelize the build.
Closes https://github.com/caffe2/caffe2/pull/1282

Differential Revision: D5992231

Pulled By: sf-wind

fbshipit-source-id: 01ba06b6efcb0f535f974a2dfffbae9ba385d27d
2017-10-06 11:06:29 -07:00
3ae961f062 Release saved variables in generated functions (#3004) 2017-10-06 12:17:07 -04:00
10b42f5d6c Add ONNX support for ConstantPadNd (#2962)
* Add ONNX support for ConstantPadNd

* add comments to explain the order of paddings and pad is guaranteed to have even elements
2017-10-06 11:03:48 -04:00
898c732293 Introduce a reduce keyword argument for MSELoss (#2878)
* Add reduce keyword to MSECriterion API

* Move gradOutput usage from py to backend

* Implement reduce keyword for THNN MSECriterion

* Implement reduce keyword for THCUNN MSECriterion

* Implement reduce keyword for MSE double backwards

* Tests for MSECriterion with reduce keyword

* Documentation for reduce for MSELoss

* Make legacy nn work with reduce keyword by ignoring it

* Apply linter suggestions

* Address comments (small changes)

* Revert "Tests for MSECriterion with reduce keyword"

This reverts commit 1c0be0defa49d336d023d7d9795db4037c92b6fe.

* Undo changes to legacy nn tests

* Reuse module test for MSELoss by creating a wrapper class for MSELoss

* Address comments: refactor MSECriterion.cu to be nicer

* Fix lint & build errors
2017-10-06 10:57:22 -04:00
6a91f556d0 fix a bug in exporter, we forgot to copy type to the new node for index op 2017-10-06 10:40:14 -04:00
7dd74b6a71 Address the introduced types in ONNX PR 57 2017-10-06 10:40:14 -04:00
268fce1073 change encodeType to encodeTypeProtoTensorType 2017-10-06 10:40:14 -04:00
10537ce4ed Support the new proto introduced in onnx/onnx PR 51 2017-10-06 10:40:14 -04:00
b2f5ccf366 lint 2017-10-06 10:39:33 -04:00
0c2957512f Fix two legacy modules clearing input tensor in clearState 2017-10-06 10:39:33 -04:00
ecdb86e733 Update all existing nn tests to new args format; Move all randomness inside tests 2017-10-06 10:39:33 -04:00
b6e1dd2674 Remove top-level seed setting 2017-10-06 10:39:33 -04:00
c76e2900a8 Change TestCase args to accept value, size or fn for constructor_args, input and target 2017-10-06 10:39:33 -04:00
5f8bab47c8 bugfix for 2428 ussue (#3000) 2017-10-06 09:20:12 -04:00
50208c9fd6 Refactor GLConvolution
Summary:
Separate class definition into header file
Remove uniform buffer initialization in the constructor because it's not necessary
Separate tiling and batching code

Reviewed By: jerryzh168

Differential Revision: D5960502

fbshipit-source-id: 5e3bce5192ce6dc69868be1722f490f690d87076
2017-10-05 22:31:47 -07:00
f535700ccc Add weighted_sampling operator to Caffe2
Summary: Add weighted_sampling operator to Caffe2

Reviewed By: akyrola

Differential Revision: D5962199

fbshipit-source-id: ab3f56a1dc7b8eaf4ed4d74af6c6c08dccca5a1e
2017-10-05 20:33:59 -07:00
4af66c4304 Cleanup: remove useCurrentStream function (#2990) 2017-10-05 23:04:59 -04:00
4b3400b249 Added statistics for standard deviation
Summary:
Added an exported statistics that helps in computing
standard deviation. It uses an offset-ed mode of computation
to avoid a common pitfall

Reviewed By: azzolini

Differential Revision: D5977811

fbshipit-source-id: e9f3b99a952e10fb3e3eb18a29b5bdca92f82f4c
2017-10-05 17:06:27 -07:00
db06e91097 Bump gloo
Summary:
Latest version of Gloo takes care of MPI_Init/MPI_Finalize for us, so
this commit removes handling that from caffe2/contrib/gloo. It also
imports CMake NCCL module changes from Gloo to stay consistent and
allow setting NCCL_INCLUDE_DIR and NCCL_LIB_DIR separately.
Closes https://github.com/caffe2/caffe2/pull/1295

Reviewed By: dzhulgakov

Differential Revision: D5979364

Pulled By: pietern

fbshipit-source-id: 794b00b0a445317c30a13cc8f0f4dc38e590cc77
2017-10-05 16:57:59 -07:00
225de6628c Improve NNPACK error message
Reviewed By: dzhulgakov

Differential Revision: D5923278

fbshipit-source-id: 79b03f0cb24a4c71a34b6e8f95f2fc2a709c4afa
2017-10-05 15:43:46 -07:00
9425a2bf19 Fix cudnn grid_sample backward for implicit gradOutput (#2993) 2017-10-05 17:57:35 -04:00
0710a90fa1 Tiled Softmax
Summary: Add tiling support for GLSoftmax

Reviewed By: hlu1

Differential Revision: D5891341

fbshipit-source-id: 38db5f64b3363852b4b650fed0ee1ee425d041a5
2017-10-05 12:47:15 -07:00
2e4de82514 Support more ONNX ops in autograd execution
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-05 15:27:49 -04:00
2861638e8a Add torch.random.fork_rng, which forks the RNG temporarily.
There is a bit of nuance to this function.  If one blindly charges in
and initializes all GPUs, it is going to take a long time.  20sec for
8 GPUs on my dev machine.  But to a user, it is non-obvious that fork_rng
is going to hit all the GPUs by default (which it does by default for
safety reasons.)  So there is a nice warning when we notice we're
hitting more than one GPU.  There is a bit of extra generality
which is going to be used by torch.jit in a subsequent commit.
2017-10-05 15:27:49 -04:00
539ae451d2 Move random initialization functions from torch to torch.random.
The motivation is that I wanted to add some more general purpose
utility random functions, but not gunk up torch/__init__.py.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-05 15:27:49 -04:00
b08219b51a Correctly mark a method as override.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-05 15:27:49 -04:00
bfd77e9942 Delete obsolete comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-05 15:27:49 -04:00
0ae56ab247 Squash Python.h warning.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-05 15:27:49 -04:00
f9e9c5326b Support for Tanh and Sigmoid in the executor.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-05 15:27:49 -04:00
be04d5a347 Print small tensors in IR.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-05 15:27:49 -04:00
c9f7b1efcc Fix additional deprecated function signatures 2017-10-05 11:47:33 -07:00
da46b9c886 Make cudnn relu op work for empty batches
Summary: As titled.

Reviewed By: azzolini

Differential Revision: D5888868

fbshipit-source-id: fbccf63b6fa1e9b487c81de2ca86488e91f18274
2017-10-05 11:40:24 -07:00
5eb45fb0b4 Add check for Travis in executor test
Summary: Also check whether test runs under Travis

Reviewed By: Yangqing

Differential Revision: D5966311

fbshipit-source-id: 0d72259e194b25cc7477d6e62c6fa8e8d83e5f50
2017-10-05 11:40:23 -07:00
2631ee749a Generate ATen from torch/lib/ATen/Declarations.cwrap
- Update calls to use new ATen ordering
2017-10-05 11:29:53 -07:00
92fdc55aaf Merge commit 'ba1f94b6f59e5cba251d4a4266701f1e72015bc2' 2017-10-05 11:04:52 -07:00
ba1f94b6f5 Refactor out TensorBase from Tensor
Use TensorBase in Scalar class
2017-10-05 14:02:34 -04:00
ef3b7597b7 Fix copy and move constructors 2017-10-05 14:02:34 -04:00
fa812c4511 Remove has_full_argument_list 2017-10-05 14:02:34 -04:00
a18e81ddb8 Fix lint 2017-10-05 14:02:34 -04:00
5e564d6c12 Add check that tensor is defined in Scalar constructor 2017-10-05 14:02:34 -04:00
4a12f70ba1 Move default arguments to function declaration
* Make alpha, beta in addmm kwarg_only
 * Move kwarg_only arguments to the end
 * _out variants now have output arguments at the beginning
2017-10-05 14:02:34 -04:00
d8a0cdc0c5 Adding asan option
Summary:
This would allow one to debug with asan. Known problems:
- only works with new -fsanitizer=address option.
- not tested on clang.

It's turned off in default so existing builds won't be affected.
Closes https://github.com/caffe2/caffe2/pull/1299

Differential Revision: D5987034

Pulled By: Yangqing

fbshipit-source-id: de29cd3b84edaed5db73e33f8f759c5c3271b5b7
2017-10-05 10:55:04 -07:00
cbdbe518e9 If cudnnSetStream is not successful, give error instead of warning (#2988)
* If cudnnSetStream is not successful, give error instead of warning

* Use built-in error reporting
2017-10-05 13:12:29 -04:00
c74f7d8ade Support varags style IntLists in derivatives.yaml and implement view. (#2963) 2017-10-05 11:46:23 -04:00
137b139551 Make cuDNN use the current stream (#2984) 2017-10-05 09:27:04 -04:00
fb8a7679cc preprocs for embeddings
Summary: embeddings

Differential Revision: D5888420

fbshipit-source-id: b293df6444cba49e2feab6ccf8b8346019e5b421
2017-10-04 22:18:21 -07:00
de43326cfc Identify components after sparse layers' tagging
Summary: Given a pair (init_net, train_net) where ops in sparse layers are tagged, this diff detects the components and rename the `node_name` (e.g. tag) to reflect the component name.

Reviewed By: azzolini

Differential Revision: D5948222

fbshipit-source-id: aeda9cfc88bb64922bf7a9942b969e3c5066718a
2017-10-04 21:03:47 -07:00
b649ce3d6d Caffe2 Benchmarking Framework
Summary:
Implement a framework to benchmark the Caffe2 inferencing time. It only contains the observer collecting the delay information for running the net and the operator. The driver of the benchmark is in a separate repository.

It does not interfere with the rest of the Caffe2.
Closes https://github.com/caffe2/caffe2/pull/1263

Reviewed By: bwasti

Differential Revision: D5956861

Pulled By: sf-wind

fbshipit-source-id: ba4f0226066f55d333b27d472e09137d7272d449
2017-10-04 20:02:37 -07:00
20b3918ba8 add cuda support for Topk Gradient
Summary: as title

Reviewed By: azzolini

Differential Revision: D5822303

fbshipit-source-id: 3bc88a9071167c41e3fc717a2b31dceee6fee360
2017-10-04 19:31:56 -07:00
642542ec2d Resolve heap-buffer-overflow problem
Summary:
In instance norm implementation, the lambda function is causing heap overflow
so moving it explicitly into the function body itself.

accept2ship

Reviewed By: pietern

Differential Revision: D5981662

fbshipit-source-id: 6901c9cd738de048e3d0308a0a4c52f9c37e524a
2017-10-04 19:02:32 -07:00
b029582655 Merge commit '03d856977ecbaac87e598c0c4bafca96761b9ac7' 2017-10-04 21:57:36 -04:00
8e309c014c Tagging sparse parameters
Summary:
This is the first step on DPER side to use net transformation step (`parallelize_net`).

So far, it tags the sparse parameters (in init_net and train_net) once distributed trainer nets are built.

Next step is to merge the part that creates distributed trainer nets (`create_distributed_trainer_nets`) into the part that creates single-trainer, multi-reader nets ('create_distributed_reader_nets`). This step should get rid of parts of `MixtureStrategyModelBuilder`.

Reviewed By: azzolini

Differential Revision: D5902733

fbshipit-source-id: 85fbddbb6c2704badd82b237f1dd2c7c5790e43a
2017-10-04 18:46:48 -07:00
7e80dc6cbd Remove check that can never be true from RNNOp.
Summary: As desc.

Reviewed By: salexspb

Differential Revision: D5971303

fbshipit-source-id: 4728b4df91e16c151efce48f1987f2e5d109f343
2017-10-04 17:36:02 -07:00
995c83f945 Disable cudnn dropout
Summary: The cudnn version of the DropoutOp was taking a significant (and unwarranted) amount of time in our RNN training. Further investigation showed that setting the cudnn dropout descriptors was an extremely expensive operation (https://pxl.cl/99nT), much more so than the dropout operation itself. This diff adds to the DropoutCell the option to disable cudnn. The non-cudnn version uses a raw curand call that elides all of the expensive descriptor setting.

Reviewed By: jmp84, akyrola

Differential Revision: D5972022

fbshipit-source-id: 6325ec5d6569f8b94d776cbb2554cc8ddb28f699
2017-10-04 17:24:09 -07:00
6a71cfa31e Faster version for RowWiseSparseAdagradOp
Summary: Move common operation out of loop.

Reviewed By: dzhulgakov

Differential Revision: D5962894

fbshipit-source-id: e4f8a5406c870958215cbc1fd366fa87bc381471
2017-10-04 15:46:39 -07:00
a2be56bc34 add GatherRangesToDense operator
Summary: adding an operator with behavior similar to fused GatherRanges and Split.

Reviewed By: kennyhorror

Differential Revision: D5961761

fbshipit-source-id: 616d4668b8901256418004def90d91a0b2041620
2017-10-04 15:18:10 -07:00
964d740ede adding batch support to SequenceMaskOps
Summary:
Added support for batching to SequenceMaskOp.

Let b be the batch dim and k be the axis dim. (We enforce that b < k.) Write the dimensions of the input tensor as [a_1, ..., a_b, ..., a_k, ...]. We first collapse our tensor down to 3D, with dimensions [P, Q, D], where: P = a_1 * ... * a_b, Q=a_{b+1} * ... * a_{k-1}, and D=a_k * a_{k+1} * ... * a_n. Then we mask each slice [i, :, : ] of this 3D tensor (note that each slice is a Q times D tensor w/ dimension 2)

Reviewed By: jamesr66a

Differential Revision: D5733382

fbshipit-source-id: e7a314d9fe6e6691a75112edbee8ba6e8ea8e396
2017-10-04 15:18:09 -07:00
ba766ef39a Fix BN size check in eval mode (#2977) 2017-10-04 16:03:20 -04:00
7a809ea6fd Fix build for MSVC 2017-10-04 16:01:30 -04:00
f783a65a5a Merge commit 'bace20a7d446c4e130d49ad47c3370ae00f82c05' 2017-10-04 16:00:15 -04:00
bace20a7d4 Fix build for MSVC 2017-10-04 15:56:57 -04:00
91bb6ce095 Allow explicitly specifying to use operators' default implementation
Reviewed By: dzhulgakov

Differential Revision: D5973635

fbshipit-source-id: 12dccc6332a8dd264ccc9f831a053a3be9b89c56
2017-10-04 12:17:36 -07:00
d2e94d0faa change device enums to be contiguous
Summary: quick change

Reviewed By: ajtulloch

Differential Revision: D5976025

fbshipit-source-id: a5a1538a380edb7c3b0af76e74c2ccee09ecb928
2017-10-04 11:17:57 -07:00
029252fb3b NNPACK bindings for Convolution (#2826)
* skeleton commit for building and linking nnpack library in PyTorch

* first stab at conv forward binding + integration

* bind NNPACK gradient kernels

* move nnpack forward, input gradient calls deeper

* nnpack conv api mimics nn

* fix symbol error; use memory across calls

* clean up warnings, add shape checking, thread safety, configurable thread specification

* add batch size threshold, also bind for single-element batch for the future
2017-10-04 13:48:14 -04:00
42712c677d More user-friendly error messages for indexing with multi-dimensional LongTensors (#2974) 2017-10-04 10:55:55 -04:00
f608208a80 Fix scatter size check (#2960)
* scatter size check

* add comment for size_check macro
2017-10-04 10:21:29 -04:00
b3bcba60c7 Correct padding docs of 3D modules (#2970)
3D modules apply padding on all three sides. "Both" doesn't make sense here.
I used the wording of the AvgPool3d docstring, where it was already correct.
2017-10-04 09:52:37 -04:00
756ab3f24f Adding conversion from python tensor to dlpack tensor (#2933) 2017-10-04 08:35:42 -04:00
5527dd3b08 Expose CMake options in the binary
Summary:
Useful for figuring out with people which version they built with. We can just ask for --caffe2_version gflag or get core.build_options from python.

Also adds CMAKE_INSTALL_RPATH_USE_LINK_PATH - without it wasn't building on my Mac. How should it be tested?
Closes https://github.com/caffe2/caffe2/pull/1271

Reviewed By: bddppq

Differential Revision: D5940750

Pulled By: dzhulgakov

fbshipit-source-id: 45b4c94f67e79346a10a65b34f40fd258295dad1
2017-10-04 02:33:02 -07:00
acc384183a caffe2 operator logit / logit gradient CUDA implementation
Summary: This is the continuation of T20872698 Implement the gradient operator for element-wise Logit

Reviewed By: asaadaldien

Differential Revision: D5969487

fbshipit-source-id: c9bb4222529f9fd9085aa9048b90eb70a63f41f4
2017-10-03 18:48:25 -07:00
81284c7a0d Translating Crop to Slice
Summary:
Only works for len(offset) == 1 for now.
Also, Slice Op only supports slicing in one dimension,
can we extend it to support slicing multiple dimensions?

Reviewed By: bwasti

Differential Revision: D5967476

fbshipit-source-id: 6cf9ff510e752ddb3bc9673d47f6a577ae9ccc79
2017-10-03 17:18:32 -07:00
17a92389b3 Remove metal remnants
Summary: Clean up the metal remnants in BUCK now that the metal code has been removed

Reviewed By: bwasti

Differential Revision: D5966095

fbshipit-source-id: 6b022624fe91a6728549d93d2954328c6b4e059e
2017-10-03 15:43:58 -07:00
5f864ca4d2 Support TensorList arguments, torch.cat, and narrow in derivatives.yaml (#2936)
* Generate torch.cat autograd via ATen.

Most of the change is around supporting generation of:
1) TensorList arguments
2) Arguments to "size", "sizes", i.e. "sizes(dim)"
2017-10-03 18:21:10 -04:00
c489445c46 Add ONNX support for Mean (#2956) 2017-10-03 18:16:45 -04:00
faa6fdfa18 Raise error when each channel only has 1 value in batch norm (#2961)
* add error when each channel only has 1 value
2017-10-03 17:56:15 -04:00
6fbdf40284 Translate addmm into Gemm operator / fix alpha-beta mixup / execute in JIT.
The alpha/beta naming in addmm was flipped; this commit fixes that
problem.  It also fixes the ONNX export of alpha/beta parameters.
Finally, it supports executing matmul in the JIT.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-03 17:23:43 -04:00
76a282d228 Fix resizing of gradInput in BatchNormalization (#2959)
* In C there was a race condition when gradInput was resized within the
   parallel for
 * CUDA was missing the resize for gradInput
2017-10-03 15:38:34 -04:00
9088a940d7 Completed Stride() documentation (#2948) 2017-10-03 15:36:10 -04:00
9b5f70780c Merge commit '9a471b015fc6db86f67833d29114b7307e3b1727' 2017-10-03 10:27:43 -07:00
1512562613 Fix lint 2017-10-03 09:20:55 -07:00
1ff34a0535 generates non-equal random tensor for max pool 2017-10-03 11:56:59 -04:00
fa8044d92f Add tests for array interface 2017-10-03 10:27:56 -04:00
c488a9e9bf Add Numpy array interface to tensors 2017-10-03 10:27:56 -04:00
b6b41c829a Add inplace checks in JIT 2017-10-03 10:20:58 -04:00
82bc97e6be Fix THC exponential to not sample infinity 2017-10-03 10:06:47 -04:00
437d3af7bf Add CUDNN_INCLUDE_DIR before CUDA directories in setup.py 2017-10-03 10:06:47 -04:00
bf82ecd776 Hotpatch THPP compile error 2017-10-03 10:06:47 -04:00
6fbbb1bc4e Limit number of demangler invocations in autograd profiler 2017-10-03 09:55:37 -04:00
7fc7756487 Refactor param initialization from model manipulation to layers logic
Summary: This diff refactors the parameter initialization logic from model manipulation to layers

Reviewed By: azzolini

Differential Revision: D5920225

fbshipit-source-id: 50d230e406bc9ce0b00bdd164802c504cf32ea46
2017-10-02 22:08:40 -07:00
9a471b015f Implement _unnarrow (backwards of narrow) in ATen.
Note this is currently prefixed with an underscore because it may go away
(can be implemented via index).
2017-10-02 21:25:59 -04:00
d381efcf3c Enable wrap_dim in Local.cwrap.
This includes torch.cat, which is a TensorList argument, which wasn't supported before.
2017-10-02 21:25:59 -04:00
bf7b11f235 Fix executor test base module
Summary: Fix base module of executor test util

Reviewed By: dzhulgakov

Differential Revision: D5960543

fbshipit-source-id: 4bcaba583a2c8ee4f7544b8000ad60e8d9846936
2017-10-02 17:34:06 -07:00
d1213cc6c2 Include information of the engine for Caffe2 operators.
Summary: Include information of the engine for Caffe2 operators.

Reviewed By: salexspb

Differential Revision: D5876323

fbshipit-source-id: 3b1837ccff098109bdfb0865a4fa3f509496ffdb
2017-10-02 17:24:48 -07:00
49396c6fa1 add openglv2 to experimental
Summary: only changes needing review are in proto_utils.cc and caffe2.proto

Reviewed By: jerryzh168

Differential Revision: D5956743

fbshipit-source-id: e03fffaf5bc8413f2320c20a89a421f1a69b2870
2017-10-02 15:59:25 -07:00
312e0ce3ba fix nn.HingeEmbeddingLoss doc 2017-10-02 18:14:40 -04:00
2c26f4728a fix typo in document of nn.AdaptiveMaxPool1d 2017-10-02 17:54:42 -04:00
e4701e63f6 Fix exporting Reshape with single torch.Size argument 2017-10-02 23:29:49 +02:00
4d605259b9 Fixes after merging ATen:
* Mark all (non-static) Type methods as const.
2017-10-02 13:14:35 -07:00
e99aec9e9e Merge commit '9f4accd5bb99900dfda9ffab110aeb7a4534d629'
* commit '9f4accd5bb99900dfda9ffab110aeb7a4534d629':
  Make all dim arguments int64_t
  Converting dlpack tensor to aten tensor
  adding a simple class for converting atensor to dlTensor
  Test stub for dlconvertor
  adding dlpack header
  Fix build failure in MSVC
  Mark all (non-static) Type methods as const.
2017-10-02 13:01:18 -07:00
6258fc2f15 Executor benchmarks
Summary:
Executor benchmarks to measure QPS for different models (sparse nn hogwild and
dataparallel, resnet50 dataparallel)

Reviewed By: dzhulgakov

Differential Revision: D5950770

fbshipit-source-id: 9aa8e0480468a55a6a97b10589d785c682fae01e
2017-10-02 12:59:21 -07:00
1f3424b78f Adjust test thresholds
Summary: Adjust test thresholds and number of examples

Reviewed By: salexspb

Differential Revision: D5945588

fbshipit-source-id: 7aecb8c642d8775f51dd3c296a28f1faf7ae0c81
2017-10-02 12:59:20 -07:00
4c61cf2a1f Updated functions for benchmark test 2017-10-02 15:01:51 -04:00
00b62db723 Fix scope error
error: ‘getInitConfig’ was not declared in this scope
2017-10-02 14:43:46 -04:00
621603169c initialize new tensor 2017-10-02 09:53:21 -04:00
6ef417ce89 Fix typos 2017-10-02 09:32:25 -04:00
ca644ca204 Add inplace zero to variable (#2212) 2017-10-02 14:02:24 +02:00
3ce6f0a457 turn ModelProto.graph into callback type 2017-10-01 23:09:13 -04:00
9fc86782d7 Fix the breaking changes in ONNX PR #58 2017-10-01 23:09:13 -04:00
a64daf2c59 support dictionary return types in nn.Module's __call__ (#2037) 2017-10-01 20:33:03 -04:00
5d9de014bd Fix typos 2017-10-01 03:09:25 -04:00
21f8ad44e1 put limits on CuDNN BatchNorm codepath 2017-09-30 19:00:44 -04:00
d5a7e304fa added volumetric adaptive max pooling 2017-09-30 16:57:51 -04:00
7ff9e0eb6c fixed test_AdaptiveMaxPool*d_indices testing the non-adaptive classes 2017-09-30 16:57:51 -04:00
9415f84982 spatial CUDA kernel int64_t stride inputs, removed unused parameter 2017-09-30 16:57:51 -04:00
855b7e28ee START_IND & END_IND macros, removed unnecessary computation in updateGradInput 2017-09-30 16:57:51 -04:00
b9c942a7d4 reorder spatial variables BDHW 2017-09-30 16:57:51 -04:00
0685c063bf rename spatial version variables 2017-09-30 16:57:51 -04:00
67b2923a9d Set all GPU state, not just the first one.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-30 16:21:04 -04:00
a8bf73be50 Mention random_ not available on CUDA.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-30 16:21:04 -04:00
2dcaa40425 Add get_rng_state_all and set_rng_state_all.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-30 16:21:04 -04:00
db298618e4 Minor typofix.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-09-30 16:18:03 -04:00
60bff0a5f3 fix nccl version 2017-09-30 16:17:20 -04:00
5cc3aff9ba use nccl deb in Dockerfile, easier to change python version 2017-09-30 16:17:20 -04:00
9b9704e701 Simplify getApplyGrid in THC (#2900) 2017-09-30 16:16:36 -04:00
b3bc5fe302 refactor THCP method defs into cuda/Module.cpp 2017-09-30 13:14:35 -07:00
7190979ab3 fix the script to generate the nanopb files (#2907) 2017-09-30 10:21:06 -04:00
d315c62e72 Kick fbsync
Summary:
fbshipit-source-id: 886ac051235a878b5b0fe294619bb6184d5d24ab

(Note: this ignores all push blocking failures!)

Reviewed By: dzhulgakov

Differential Revision: D5947236

fbshipit-source-id: c3f7d00d5d7faad6366d4c456fffb9387f30b2aa
2017-09-29 16:31:11 -07:00
4acf56cf80 Typo
Summary: Typo in the docstring

Reviewed By: azzolini

Differential Revision: D5943729

fbshipit-source-id: f4c7adfb8d8855ba66ee988868650acbf0f6ccdb
2017-09-29 16:31:11 -07:00
c775b90426 Fix aten submodule
Effectively D5935765
2017-09-29 16:31:11 -07:00
181b2481d3 add error checking to grid sampler (#2902) 2017-09-29 15:18:31 -04:00
d7ee3e0bd0 Fix the memory leak for multiple workers (#2897) 2017-09-29 11:58:28 -04:00
e67c2bc567 Fix detection of NCCL_INCLUDE_DIR (#2877)
* Fix detection of nccl.h when libnccl.so is in /usr/lib/x86_64-linux-gnu and similar paths

* full support for independent NCCL_LIB_DIR and NCCL_INCLUDE_DIR

* lint fix

* add back CUDA_HOME
2017-09-29 10:42:10 -04:00
8c0844f497 Executor test
Summary:
Executor test that checks on different models that model params are the same
when using a given executor and simple net

Reviewed By: akyrola

Differential Revision: D5908769

fbshipit-source-id: b6f5a2cf89c5c67b68e8b9be3264f38d5740d897
2017-09-29 02:07:14 -07:00
6a800be748 import lr_scheduler in __init__.py
Fix https://github.com/pytorch/pytorch/issues/2809
2017-09-28 23:38:23 -04:00
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
21707065d2 latest gloo 2017-09-28 15:34:42 -07:00
96b17543a3 Compile with MKL in conda-build
Summary:
Problem:
Without -DBLAS=MKL, conda-build won't include MKL library into Caffe2 build. And the BLAS performance is bad on CPU.

Solution:
Explicitly add the flag. Add mkl and mkl-include as dependencies.

ezyang Yangqing
Closes https://github.com/caffe2/caffe2/pull/1264

Reviewed By: bddppq

Differential Revision: D5919192

Pulled By: houseroad

fbshipit-source-id: bb51e4fc4015212694404180a610e06ec8ddb424
2017-09-28 15:19:06 -07:00
a92fce1871 fix precision of grid_sample test 2017-09-28 15:11:50 -07:00
b9747af242 Use make_variable instead of VariableImpl.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 17:37:22 -04:00
7d40cce267 Simplify glu symbolic 2017-09-28 16:42:52 -04:00
c72ee3981b Add support for exporting GLU to ONNX 2017-09-28 16:42:52 -04:00
002288c118 Add launch bounds to spatial grid sampler
Needed for CUDA9
2017-09-28 16:33:34 -04:00
b9009df222 Add mask device, fix test
Reviewed By: azzolini

Differential Revision: D5930258

fbshipit-source-id: 16fdc2aeba7d95e815e55ca495118a5129495bb0
2017-09-28 12:33:01 -07:00
642dea487d update inline comment
Summary: as desc

Reviewed By: kennyhorror

Differential Revision: D5930526

fbshipit-source-id: 510388fd66b487410ff748a9e6f546a8ce27bc1d
2017-09-28 10:17:13 -07:00
954e9e370c Uncurry trace.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
bff81a3cbd s/extra/unmatched/
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
91827edd1c Fix initializers off-by-one.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
cdcf09405e Use parent setUp, which also seeds CUDA if necessary.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
600fcf2f04 Delete params.
We have decided we are not going to support it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
fecca48a2c Time how long compilation takes.
Also, still give time even if we throw an error midway.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
0ad6c2d59c Lintfix.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
cfa176b9bd Dump the final trace (redundantly), for ease of use.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
db3349faa3 Support class decorator syntax; remove instance compilation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
1cf24b8d55 Restore enabled/time debug parameters.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
c430501ee5 Timing works again, controlled by PYTORCH_JIT_TIME.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
b1ba6c3ddd Add back trace dumping, fix some syntax errors.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
7bace0a1d9 apaszke review comments
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
0c40305ddd Rewrite torch.jit interface.
torch.jit now contains two user-facing functions: compile and trace
(corresponding to what was previously trace/traced and record_trace).
The non-curried versions of these functions have been eliminated, so
that there is only one function in the API (we *must* have the
curried versions, since these enable their use as decorators).  There is
detailed usage documentation in the docblocks for these methods.

This comes with a complete rewrite of the internals of torch.jit, in the process
fixing a number of bugs.  Key points of the new implementation:

- compile and trace both always return a Module representing the wrapped
  with compilation/tracing underlying function/module.  This makes handling
  of the function/module cases more uniform, as we can think of the function
  case as creating an on-the-fly module with the parameters explicitly
  specified by the user.  For technical reasons, we now *require* any parameters
  in the function case to be honest-to-goodness Parameters (gory details:
  you can't register a Variable as a Parameter to a Module, but you can't
  create a Parameter from a Variable while sharing the same underlying
  identity.)

- Flattening and unflattening is done a lot more uniformly.  We now have
  a _flatten and _unflatten function which are inverses of each other:
  _flatten always returns both the flat, tuple of Variables, *as well as*
  the "proto" (now referred in the code as the "struct") from which we
  can unflatten the variables.  Low level functions like 'raw_trace'
  always work with the flattened inputs/outputs, which keeps their logic
  simple.

- JIT trace keying now also includes the "struct" of the input arguments.
  This is a step towards accepting non-Variable arguments in functions,
  although flatten/unflatten don't currently support it.

- TraceForKey (previously TraceInfo) has had its API reworked to have
  less degrees of freedom when you are interacting with it.

TODO: Verify, timing, and trace dumping have been temporarily excised.  I
plan on adding them back.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-28 12:34:35 -04:00
f2037970cb Cleanup for 'prob_dist' in multinomial function (fixes #1584) 2017-09-28 12:07:06 -04:00
2f381bf6a4 Joint intent-slots modeling workflow initial diff
Summary:
This is a prototype for joint intents + slots modeling workflow, it has the following:

1- New data readers and data processors to process joint labels in parallel
2 - New JointNN model
3- New Fblearner workflow (jointnn) for joint modeling experimentations

This is still work in progress, sending the diff to start the discussion about the interface and what we need to support in our joint modeling efforts.

P.S. The number of lines in this diff is multiplied by 3 since caffe2 is mirrored in both fbandroid and fbobjc.  I will highlight the most important parts so that people are not confused.

Differential Revision: D5725243

fbshipit-source-id: ecc5322f937ad0fddaf200a9e090b3573a69f994
2017-09-28 03:47:34 -07:00
b21ae92b56 Move expand_dims operators to a expand_dims_op.h/cc
Summary: As desc.

Reviewed By: dzhulgakov

Differential Revision: D5928167

fbshipit-source-id: 8fb9b21e77766e5da3037b6a82c66eb4c1f5810c
2017-09-28 01:21:32 -07:00
c27aaf67cd Improve Function docs 2017-09-27 22:41:45 -04:00
c33c9c1ba4 Fixed size_to_dim enforce
Summary: Fixed Caffe2Enforce in size_to_dim() so that it works even if k is same as the number of dimensions in the tensor.

Reviewed By: salexspb

Differential Revision: D5893264

fbshipit-source-id: 525ea263f5e21e197c7010e1c66501355b8027c8
2017-09-27 19:05:56 -07:00
095805036c re-enable out-of-place bernoulli for cuda tensors 2017-09-27 21:32:36 -04:00
9f4accd5bb Make all dim arguments int64_t 2017-09-27 15:10:34 -07:00
e9fe0d8e6c Fix for clang 9 build issues 2017-09-27 17:53:06 -04:00
0fb9db1606 Converting dlpack tensor to aten tensor 2017-09-27 09:58:52 -07:00
b4e02e8e0f adding a simple class for converting atensor to dlTensor 2017-09-27 09:58:52 -07:00
4a58e0ca42 Test stub for dlconvertor 2017-09-27 09:58:52 -07:00
c6a2175d27 adding dlpack header 2017-09-27 09:58:52 -07:00
c8f824cd1b Improve import failure messages 2017-09-27 10:37:54 -04:00
2108d1c250 Add unit-tests for fb-specific models
Reviewed By: azzolini

Differential Revision: D5895367

fbshipit-source-id: e7a7cdb272cdcdd7495efe9a6203750d1e6d6c48
2017-09-26 21:17:51 -07:00
1a8fb81f22 define M_PI for TH 2017-09-27 00:06:01 -04:00
dcee596a8b change Variable.cuda to be consistent with Tensor.cuda 2017-09-26 23:48:40 -04:00
22ec2ca968 Add shape inference to fp16<->fp32 ops
Summary:
Added to HalfToFloat and FloatToHalf
Closes https://github.com/caffe2/caffe2/pull/1241

Differential Revision: D5902071

Pulled By: salexspb

fbshipit-source-id: 9c79b0c50990200ca5bd6e00b3e8881d1c784e36
2017-09-26 19:33:08 -07:00
fb1c7874ea Deconv translation
Summary: att

Reviewed By: bddppq

Differential Revision: D5865061

fbshipit-source-id: ba27e954771ed40b0284021dee1a766fc8678829
2017-09-26 16:48:10 -07:00
cb986bb913 Deformable convolution operator in Caffe2
Summary:
This diff implements deformable convolution operator. The idea behind it is that instead of using a fixed NxM kernel, we associate a set of learnable offsets (dx, dy) with each element of the kernel, and use bilinear interpolation to estimate weights in between the integer indices. For background see paper https://arxiv.org/abs/1703.06211 and mxnet implementation https://github.com/msracver/Deformable-ConvNets/tree/master/rfcn/operator_cxx

To simplify code review of the new files the feature is stacked into 2 diffs. First diff duplicates core convolution operator into a separate set of files prefixed with deform_. It also provides documentation on the operator but nothing else. Second diff contains the actual changes that make deformable convolution possible. Thefore, I recommend focusing your code review on changes between diffs 1 and 2.

Current limitations of the operator:
1. Only CUDA is supported. CPU version is not implemented.
2. Only NCHW layout is supported.
3. Only 2d convolution is supported.

CUDA code is ported from mxnet implementation with minimal changes.

See also inline comments in code for tricky parts.

Reviewed By: akyrola

Differential Revision: D5702983

fbshipit-source-id: 4d1bf2c6c73135e6a70dbe87037b38915f4453f9
2017-09-26 16:20:31 -07:00
08b3140827 Back out D5772847 and D5908415
Summary:
D5772847 is breaking real time style transfer on android and conv unit tests on iPhone 7 upgraded to iOS 11.

The temporary fix in D5908415 only fixes android. iPhone 7 is still crashing.

I think these two diffs should be backed out before D5772847 is fully debugged

Reviewed By: fricc33

Differential Revision: D5913834

fbshipit-source-id: b8072c59c83adfed8a0b0ab0f42c39bc4398c7a0
2017-09-26 15:47:49 -07:00
8a45b65f96 ReduceFrontMax, ReduceBackMax + gradients, CPU and CUDA
Summary: Implementation of ReduceFront/Back/Max/Gradient for CPU and CUDA.

Reviewed By: asaadaldien

Differential Revision: D5905402

fbshipit-source-id: 6967ce41aa95ee5ea7a90065430892e81a6da477
2017-09-26 15:22:25 -07:00
711d7137c7 Implement the gradient operator for element-wise Logit
Summary: Implemented logit gradient with eps as arg.  Add the unit test for it and explored the optimal parameter to run the test.

Reviewed By: asaadaldien

Differential Revision: D5910655

fbshipit-source-id: 44898b784a57c7ad45519b202b1eaf95c1c4d460
2017-09-26 14:49:22 -07:00
59be3da3bc Make GLContext unique_ptr
Reviewed By: fricc33

Differential Revision: D5908793

fbshipit-source-id: 281f9ae9baac737fb8fafd79948d0804724087bc
2017-09-26 14:33:10 -07:00
44b45a1d73 Fix real time style transfer on android
Reviewed By: fricc33

Differential Revision: D5908415

fbshipit-source-id: 27af70baf7a953566cc64dab040f669784c4224b
2017-09-26 14:33:08 -07:00
de757805fc Implement some autograd functions using ATen (#2805)
This adds some generated autograd functions implemented in C++, which
are generated from derivatives.yaml. It also generates Python bindings
for the Variable methods. The generated files are:

 Functions.cpp/h: subclasses of torch::autograd::Function
 VariableType.cpp/h: The at::Type for autograd Variables
 python_variable_methods.cpp: Python bindings to torch::autograd::Variable
 python_variable_methods_dispatch.h: wrapper which releases GIL and sets the
     CUDA device
 python_functions.cpp/h: exposes generated autograd functions as Python
     objects

The generated functions are mostly shadowed by the definitions in
variable.py. We'll remove the Python implementations in favor of the
generated C++ implementations in a subsequent commit.
2017-09-26 17:08:00 -04:00
0a5ee1e806 Implemented RowWiseSparseAdagrad operator that only keeps one moment term per embedding
Summary: Implemented version of SparseAdagrad that only keeps track of an average sum of squared gradients term for each row of the parameter tensor, rather than a sum of squared gradients term for each individual parameter.

Differential Revision: D5881918

fbshipit-source-id: bd96ccf25554b457baaaca9309fc8048adbb37f7
2017-09-26 13:34:44 -07:00
9be8d0a9d2 Add a docstring for functional.linear.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-26 12:29:07 -04:00
753133f015 SignOp
Summary: Equivalent to numpy.sign for CPU and CUDA.

Reviewed By: dzhulgakov

Differential Revision: D5906446

fbshipit-source-id: 389f994bccbb87a62df2c4aaacc327f9a6223cbd
2017-09-26 09:17:45 -07:00
f14d75c7ef Proper versioning and misc CMake improvements
Summary:
This brings proper versioning in Caffe2: instead of manual version macros, this puts the version information in CMake (replacing the TODO bwasti line) and uses macros.h.in to then generate the version in the C++ header.

A few misc updates:
- Removed the mac os rpath, verified on local macbook that it is no longer needed.
- Misc updates for caffe2 ready:
  - Mapped cmake/Cuda.cmake with gloo's setting.
  - upstreamed third_party/nccl so it builds with cuda 9.
- Separated the Caffe2 cpu dependencies and cuda dependencies
  - now libCaffe2_CPU.so do not depend on any cuda libs.
  - caffe2 python extensions now depend on cpu and gpu separately too.
- Reduced the number of unused functions in Utils.cmake
Closes https://github.com/caffe2/caffe2/pull/1256

Reviewed By: dzhulgakov

Differential Revision: D5899210

Pulled By: Yangqing

fbshipit-source-id: 36366e47366c3258374d646cf410b5f49f95767b
2017-09-26 08:52:21 -07:00
2d6a880952 Fix jit attributes tests 2017-09-26 10:51:58 -04:00
d9b0bcd7a4 Make all existing (except in RoIPool) "is_test" arguments required
Reviewed By: akyrola

Differential Revision: D5830168

fbshipit-source-id: 8634e9cfe308ba0ee90cd8a5c4b09a47b0b5f015
2017-09-25 23:46:12 -07:00
808c9e3e70 fix a small typo error in sparse_lookup
Summary: as title

Reviewed By: kittipatv

Differential Revision: D5908455

fbshipit-source-id: e7c66e84a27273156d66dfd043e9cfd9b0ab9a98
2017-09-25 21:46:56 -07:00
def0506d95 Fix a caffe2-gloo dependency problem
Summary:
The problem:
Building caffe2 fails because the installed directory contains "anaconda".

The cause:
Compiling Gloo will generate a new config.h file in the binary folder.
If we put the original config.h in front, the compiler will complain "Expected GLOO_USE_CUDA to be defined".

~~~Switch the positions of the include folders can solve the problem.~~~

Function caffe2_include_directories in cmake/Utils.cmake is a little bit hacky. If the directory contains "anaconda", it will append the new include directory after existing include path. Otherwise it will insert the directory before the path. So in the first case, the directories are inserted in order, and in the latter one, they are inserted reversely.

The solution:
See the commit.

pietern #1121
Closes https://github.com/caffe2/caffe2/pull/1258

Reviewed By: Yangqing

Differential Revision: D5907167

Pulled By: houseroad

fbshipit-source-id: 2cb3916e7e0313ebc3be3d1666bfa14bbf479607
2017-09-25 21:37:12 -07:00
ded3a3b317 fix small bug in nccl setup helper 2017-09-25 21:21:36 -07:00
7caceea6e8 better error messages for Conv*d input shape checking 2017-09-25 23:53:59 -04:00
833bedc77d Add CUDA profiler bindings 2017-09-25 23:21:30 -04:00
b7849662b5 Always regenerate nn wrappers after rebuilding THNN and THCUNN 2017-09-25 23:21:30 -04:00
411e1469e0 Add tools for autograd profiling 2017-09-25 23:21:30 -04:00
bd5233b4f9 Fix on NEON
Reviewed By: bwasti

Differential Revision: D5907389

fbshipit-source-id: 51bce58f5a65e74f5f5b1d3ff0317f781ee8e57d
2017-09-25 16:21:27 -07:00
f4eca7c94d make CUDA_HOME take precedence over all other CUDA detection methods (#2863) 2017-09-25 18:17:40 -04:00
4e23658d47 Fix warnings in TH_ErfInv (#2861) 2017-09-25 18:05:32 -04:00
9defb8e653 fix Dockerfile for submodules 2017-09-25 18:04:34 -04:00
6a4ec4f9a8 VolumetricAdaptiveAveragePool 2017-09-25 15:12:44 -04:00
7254104cfc Spatial CUDA kernel: removed unused sizeD parameter; changed stride types to int64_t to be consistent with caller function 2017-09-25 15:12:44 -04:00
dd891c4923 reorder spatial version variables so that B (batch) before D (feature) before H (height) before W (width); change some code to be more concise 2017-09-25 15:12:44 -04:00
8ffe8eca6c rename spatial version 2017-09-25 15:12:44 -04:00
3128218397 Allow specifying unused inputs to torch.autograd.grad (#2859) 2017-09-25 14:42:33 -04:00
605beb2565 Parallelize CUDA LookupTable_renorm (#2803) 2017-09-25 14:04:07 -04:00
d6ff84de5c Add an aten_op to contrib.
Summary:
This operator allows the use of Torch's underlying TH libraries (TH, THC, THNN, and THCUNN)
through the ATen tensor library. Use of the operator is described in the README.
The operator itself is generated from ATen's Declarations.yaml file which describes its public API.
Closes https://github.com/caffe2/caffe2/pull/1235

Reviewed By: dzhulgakov

Differential Revision: D5876944

Pulled By: zdevito

fbshipit-source-id: b558e8563a5e82a0e6278705a4a359bd7df4e70a
2017-09-25 10:53:51 -07:00
c08395e290 Give a better error message when we hit a legacy function.
We now include the type name of the legacy function implementing
class.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-25 12:26:07 -04:00
2a8603c5e1 Make distributed recv return sender rank 2017-09-25 12:11:52 -04:00
5be06230f9 cleanup external NCCL detection, add NCCL_ROOT_DIR / NCCL_LIB_DIR mechanism 2017-09-25 11:28:59 -04:00
289dc2a870 fix argument passing bug in build_libs and allow external NCCL_ROOT_DIR via environment variable 2017-09-25 11:28:59 -04:00
30ceac28e4 also check LD_LIBRARY_PATH for cudnn 2017-09-25 11:28:59 -04:00
15a7bb3bff GatherByKeyOp (Inverse operation of PartitionOp)
Summary: Can be used to gather outputs of a sharded "Gather", or for the SparseLengthsSumGradient when we need the gradient on values.

Reviewed By: akyrola

Differential Revision: D5800901

fbshipit-source-id: 90835755d6d15be13fb0f538cfade980cf4a1cd2
2017-09-24 22:18:17 -07:00
e3609a0619 Correctly propagate remap_blob across net boundaries
Summary: If a blob is copy from device A to device B in the init_net, and then is used as an external_input in the train_net, we want the train_net to correctly use the blob already on device B instead of copying it over and over again.

Reviewed By: akyrola

Differential Revision: D5800870

fbshipit-source-id: d93f44bba80e4ed70eb03183d552496b54a966b5
2017-09-24 21:21:57 -07:00
4664808938 fix UMR UB in qtensor
Summary:
Exposed by UBSAN:
```lang=bash
caffe2/caffe2/core/qtensor.h:61:40: runtime error: load of value 190, which is not a valid value for type 'bool'
    #0 0x7fb4fc09c289 in caffe2::QTensor<caffe2::CPUContext>::Resize(std::vector<int, std::allocator<int> >) caffe2/caffe2/core/qtensor.h:61
    #1 0x7fb4fc090403 in caffe2::QuantizedFullyConnectedOp<float, caffe2::CPUContext, caffe2::DefaultEngine>::RunOnDevice() caffe2/caffe2/fb/operators/quantized_fully_connected_op.h:93
    #2 0x7fb4fc08d5ee in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:306
    #3 0x426d8a in caffe2::QFCTest(float, float, float, int, int, int, int) caffe2/caffe2/fb/operators/quantized_fully_connected_op_test.cc:78
    #4 0x4295f6 in caffe2::QuantizedFullyConnectedTest_Test_Test::TestBody() caffe2/caffe2/fb/operators/quantized_fully_connected_op_test.cc:110
    #5 0x7fb4eee3b6a1 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2458
    #6 0x7fb4eee2cbe1 in testing::Test::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2475
    #7 0x7fb4eee2cd27 in testing::TestInfo::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2656
    #8 0x7fb4eee2ce34 in testing::TestCase::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2774
    #9 0x7fb4eee2eb8b in testing::internal::UnitTestImpl::RunAllTests() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4649
    #10 0x7fb4eee2ef3c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2458
    #11 0x7fb4eee2ef3c in testing::UnitTest::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4257
    #12 0x7fb4fbee2ed0 in RUN_ALL_TESTS() third-party-buck/gcc-5-glibc-2.23/build/googletest/include/gtest/gtest.h:2233
    #13 0x7fb4fbee2d60 in main common/gtest/LightMain.cpp:12
    #14 0x7fb4e0ef7857 in __libc_start_main /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../csu/libc-start.c:289
    #15 0x424e08 in _start /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../sysdeps/x86_64/start.S:118
UndefinedBehaviorSanitizer: invalid-bool-load caffe2/caffe2/core/qtensor.h:61:40
```

Reviewed By: yfeldblum

Differential Revision: D5898877

fbshipit-source-id: e32b1732a1946fdafaec67b3fbc072dc93bcd917
2017-09-24 17:21:18 -07:00
c580352aee Adding 1d upsampling (#2846) 2017-09-24 16:50:24 -04:00
ab62a92dab De-dup beam search state reshape shape blob
Summary:
T22119644 showed that there is a potential illegal memory access in beam search with attention. Upon further inspection, we can see that there are multiple ops that write to the same old shape blob:

  {"output0": "model0/attention_decoder/attention_weighted_encoder_context_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/attention_decoder/attention_weighted_encoder_context" }},
  {"output0": "model0/attention_decoder/hidden_t_external_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/attention_decoder/hidden_t_external" }},
  {"output0": "model0/decoder/layer0/cell_t_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/decoder/layer0/cell_t" }},

This diff de-dupes these outputs

Reviewed By: akyrola

Differential Revision: D5899103

fbshipit-source-id: 8b6f3f113e764dfeb9262f6c442e1124559cd2d8
2017-09-23 23:19:44 -07:00
a5879ea9bd Resolve Windows warning C4099 issue (class/struct name mixture)
Summary:
TSIA - no functionality change introduced. See current build (e.g. https://ci.appveyor.com/project/Yangqing/caffe2/build/2148/job/mj9auhnernrgdfpe) for the warning messages produced right now.
Closes https://github.com/caffe2/caffe2/pull/1255

Differential Revision: D5899177

Pulled By: Yangqing

fbshipit-source-id: 3f41c82b0d5a1caba63d8cc7101582af63fbc99f
2017-09-23 23:04:56 -07:00
5898bd4b4d Update eigen to origin master
Summary: Closes https://github.com/caffe2/caffe2/pull/1254

Differential Revision: D5899187

Pulled By: Yangqing

fbshipit-source-id: d4b62686ca26a1dc4aab5235a36f98cf13e50cd2
2017-09-23 22:33:30 -07:00
9fe99241b2 Update gloo to master
Summary:
Gloo was incorrectly updated in #1188 to the non-master version, so this brings back gloo to master.
Closes https://github.com/caffe2/caffe2/pull/1253

Differential Revision: D5899017

Pulled By: Yangqing

fbshipit-source-id: bdf6dbbc4402814e5bcf346cb8a610a448c53cef
2017-09-23 19:45:17 -07:00
b054f369a5 minor spelling tweaks
Summary: Closes https://github.com/caffe2/caffe2/pull/1252

Differential Revision: D5898823

Pulled By: Yangqing

fbshipit-source-id: e31f636cd4de2e6fef9375bd4ba6fd1b86d98af5
2017-09-23 17:46:57 -07:00
7c45ac8e43 Officially support Python 3 in Conda build.
Summary: Closes https://github.com/caffe2/caffe2/pull/1188

Reviewed By: Yangqing

Differential Revision: D5898795

Pulled By: ezyang

fbshipit-source-id: 9d17c3239d8c76f6e0858a877242b6d2e11a4f18
2017-09-23 16:16:49 -07:00
cf769a7b6f Avoid race condition in get device properties.
Summary: TSIA

Reviewed By: salexspb

Differential Revision: D5898125

fbshipit-source-id: 1822ef2a017719442045fa446321d007b9d544b8
2017-09-23 16:01:23 -07:00
8103e185d4 Fix OSX build w/CUDA=ON
Summary:
connect to https://github.com/caffe2/caffe2/issues/1249

With this change, build, install, and smoke test pass on OSX with CUDA=ON.
```
$ cmake -DUSE_CUDA=ON ..
$ sudo make install
$ python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
Success
```
Closes https://github.com/caffe2/caffe2/pull/1251

Differential Revision: D5898758

Pulled By: Yangqing

fbshipit-source-id: 4b2362af800dbcf2d5c441ab97f68a1c23f19f24
2017-09-23 15:48:12 -07:00
7d06898592 Add travis webhook
Summary: Closes https://github.com/caffe2/caffe2/pull/1250

Differential Revision: D5898162

Pulled By: Yangqing

fbshipit-source-id: 06093e1c110c8645b876a13940552a39d3af1c43
2017-09-23 15:30:57 -07:00
eff5b8b09c parameters to vector and vector to parameters (#2795) 2017-09-23 13:06:40 -04:00
287f434900 Add support for exporting Addmm with alpha != 1 or beta != 1 2017-09-23 11:17:27 -04:00
767f704b84 Let Gloo check if it supports GPU Direct at run-time 2017-09-23 11:07:53 -04:00
3cd0003bf6 fix layers_test: atol should almost always accompany rtol
Summary: TSIA

Reviewed By: chocjy

Differential Revision: D5898129

fbshipit-source-id: f49e8478f79d9df5b59a26287fff7fc5417aac6e
2017-09-22 23:31:01 -07:00
ec801d535c Fix typo in warning in data_parallel_model
Summary: Closes https://github.com/caffe2/caffe2/pull/1219

Differential Revision: D5898077

Pulled By: Yangqing

fbshipit-source-id: 7ee726ef3399a350a36e77093cbad0f70f8f3dce
2017-09-22 23:03:28 -07:00
b984eb35cd Fix concat_split_op for input.size() > sizeof(int32)
Summary: We were keeping the offset in an int :(

Reviewed By: kennyhorror

Differential Revision: D5811955

fbshipit-source-id: 7d00833fa0d5847beed44b73ea74fcb5a8e24090
2017-09-22 15:48:35 -07:00
bf9ab91779 Indicate if the last invocation of setup.py was debug or not.
How to use:

    import torch.version
    print(torch.version.debug)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-22 18:33:47 -04:00
c630c34d41 remove Undefined node from black list, remove const_cast, add some comments 2017-09-22 18:22:32 -04:00
f256f686b5 Remove device comparison TODO mark, change the white list to black list on node kind checing 2017-09-22 17:06:27 -04:00
0566a4c026 Fix some bugs, and assume graph is always visited in topological order. 2017-09-22 17:06:27 -04:00
18a1d272bf Add attributes comparison, fixed several issues, more interesting test case. 2017-09-22 17:06:27 -04:00
972d048cf8 Typofix [ci skip] 2017-09-22 17:06:27 -04:00
0a1ac8bfe5 create a cse pass, with very naive support. 2017-09-22 17:06:27 -04:00
999607460a Add a verbose option for gradcheck. (#2780)
When verbose is True, a more detailed message on why gradcheck failed
will be printed to stderr.
2017-09-22 15:59:14 -04:00
0d6baa0d59 Fix lack of data dependencies for beam search RecurrentNetwork op
Summary: Previously, the RecurrentNetwork op used for our beam search did not have any of the input blobs listed as data dependencies. This was fine when we were using SimpleNet, since the ops were run in the order in which we added them to the graph, and thus the RecurrentNetwork op was run after all the other ops. However, when switching to DAG, the ops that produce input data for the beam search were being run in parallel with the RecurrentNetwork beam search op, which caused non-deterministic failures based on thread scheduling. This fixes that

Reviewed By: jmp84, jhcross

Differential Revision: D5879622

fbshipit-source-id: b622de1f6a24b2636b191096db92990e0535890c
2017-09-22 12:18:20 -07:00
dee3ac3fce Use Resize instead of reshape in speed_benchmark
Summary:
When using reshape, the speed_benchmark always reports an error.
When using resize, the speed_benchmark can run without any issue.

Reviewed By: salexspb

Differential Revision: D5847999

fbshipit-source-id: 1b9899534d514c779d1710008e239124fe3d2377
2017-09-22 11:17:23 -07:00
5aac6a2e06 Make LastNWindowCollector thread-safe
Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data.

Reviewed By: chocjy

Differential Revision: D5858335

fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e
2017-09-22 09:48:30 -07:00
2070467c57 Allow CheckpointManager init() and load() to use a different db type with path_prefix
Summary: CheckpointManager already accepts a path_prefix override for init() and load(), but it assumes the same db_type passed in __init__(). This change adds an optional path_type for each call.

Reviewed By: boryiingsu

Differential Revision: D5888152

fbshipit-source-id: 21cd31a62a0188fe0e0b19b43c3b232c2342d0a8
2017-09-22 09:48:29 -07:00
e4d6ee114f typo fix 2017-09-22 12:37:59 -04:00
450379256c Don't call is_available() in manual_seed, it initializes CUDA.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-22 12:37:06 -04:00
b17dfa07ba Make CUDA seeding/RNG state functions even lazier
Instead of initializing CUDA immediately and executing them,
we wait until we actually initialize CUDA before executing.

To keep things debuggable, we also keep track of the original
backtrace when these functions are called, so we can inform
users where they actually called the seeding/state functions
(as opposed to the first time they actually initialized the
RNG).

Fixes #2517

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-22 12:37:06 -04:00
06d7a0b1bc Write docs for RNG seeding on GPU more carefully.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-22 12:37:06 -04:00
805ad16924 Support "expanding" an empty tensor to an empty tensor. (#2824)
This doesn't currently support expanding the sizes to (0,), but
we can handle that eventually at the ATen level.
2017-09-22 11:58:03 -04:00
34a1d414a5 [Distributed/Gloo] 3X performance improvement of Gloo AllReduce By Enabling CUDA Direct (#2827) 2017-09-22 09:32:56 -04:00
9b2c5501b8 Fix Windows build
Summary:
After this, windows should be all green.
Closes https://github.com/caffe2/caffe2/pull/1228

Reviewed By: bwasti

Differential Revision: D5888328

Pulled By: Yangqing

fbshipit-source-id: 98fd39a4424237f2910df69c8609455d7af3ca34
2017-09-21 20:13:15 -07:00
f841446fbb Formatting fix for verbose net logging
Summary:
This doesn't look quite right:
`I0915 21:26:03.910737    19 net_simple.cc:24] Creating operator :ConstantFill`
Closes https://github.com/caffe2/caffe2/pull/1218

Differential Revision: D5888865

Pulled By: Yangqing

fbshipit-source-id: 7db5059fd952c200a11fdcf01126e43497565116
2017-09-21 20:13:14 -07:00
a340d141de Check num_elements > num_samples in UniformSampling
Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time.

Reviewed By: kittipatv

Differential Revision: D5858085

fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891
2017-09-21 16:37:20 -07:00
cf7e28de8e add CUDA RNG docs 2017-09-21 19:36:41 -04:00
85b08f1b99 Trying to fix all networkx 2 issues.
Summary:
Basically:

- more generator vs list changes.
- difference in the return type of bellman_ford(), see _get_path. 2.x returns list.
- nx 2 removed nbunch in topological_order, so we will need to manually use lexicographical_topological_sort with an explicit key derived from the source node order.
Closes https://github.com/caffe2/caffe2/pull/1243

Reviewed By: ajtulloch

Differential Revision: D5883195

Pulled By: Yangqing

fbshipit-source-id: 215d01fdd026d3af1a11ff866bf835e104370e4c
2017-09-21 16:01:47 -07:00
e86f941395 quick fix image input op
Summary: This is a quick fix for image input op

Reviewed By: bddppq

Differential Revision: D5857147

fbshipit-source-id: 4b5102616fe295c7c21d394391af8030b79de992
2017-09-21 15:21:46 -07:00
4106c650d3 fix a race in type registration
Summary:
Here's what's happening:
C++ only guarantees that static initialization is thread safe there: https://fburl.com/40wdmf1q
So TypeNameRegisterer<bool> can not be called concurrently with TypeNameRegisterer<bool> from another invocation

But there's no guarantees about different template specializations as
they declare separate variables. Thus TypeNameRegisterer<int> might
race with TypeNameRegisterer<bool>. And TypeNameRegisterer accesses
the global variable here: https://fburl.com/gv2mhi08

Thanks dzhulgakov for the investigation!

Reviewed By: Yangqing

Differential Revision: D5882913

fbshipit-source-id: 4db1080b11e6351ce8136373e2dfc52980642fbb
2017-09-21 15:21:45 -07:00
b8ab3080b1 Fix InferShapesAndTypes() for convolutions
Summary:
If kernel sizes were specified via "kernel_w" and "kernel_h", tensor size
inference was incorrect in InferShapesAndTypes(): it was checking for
"helper_w" instead of "kernel_w".

Reviewed By: akyrola

Differential Revision: D5884280

fbshipit-source-id: 430cbedcedadbe3570384e706198a4ddc499504e
2017-09-21 14:50:43 -07:00
2cbb4167c1 Adding uint8 support for to code generator for and high-performance emebding look-up kernels, supporting
Summary:
Adding uint8  support for to code generator for and high-performance emebding look-up kernels, supporting
Sum, WeightedSum, and Mean reducers. Added number of unit tests to test these operators.

Performance Results
===================

Performance results are below for old code, sparse_lengths_sum_benchmark.old.par, that uses
code in lengths_reducer_rowwise_8bit_ops.h, and our new code, optimized via code generator,
sparse_lengths_sum_benchmark.new.par.  Block size was 128 in all cases.

[root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.old.par --iteration 10000 --dtype uint8
I0912 02:49:58.773259 2640913 net_simple.cc:162] Time per operator type:
I0912 02:49:58.773264 2640913 net_simple.cc:171]         0.75769 SparseLengthsSum8BitsRowwise

[root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype uint8
I0912 02:50:33.981832 2642102 net_simple.cc:162] Time per operator type:
I0912 02:50:33.981837 2642102 net_simple.cc:171]        0.233322 SparseLengthsSum8BitsRowwise

[root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype float16
I0912 02:51:26.748972 2643925 net_simple.cc:162] Time per operator type:
I0912 02:51:26.748977 2643925 net_simple.cc:171]        0.106591 SparseLengthsSum

[root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype float
I0913 01:39:22.372238 1076874 net_simple.cc:162] Time per operator type:
I0913 01:39:22.372244 1076874 net_simple.cc:171]        0.211041 SparseLengthsSum

Analysis
========
Our optimized generated code is ~3.5x faster than original code in lengths_reducer_rowwise_8bit_ops.h
as shown below.

However, our uint8 is about 2x slower than float16 and is on par with float32. There are several reasons for that:
1. uint8 intrudoces extra instructions to multiply by bias and add scaling factors
2. In addition to emebding blocks, we are now also reading scale_bias.
   For every pair of scale and bias, we bring entire cache line of
   64 bytes, whiles only using 8 bytes. 128-wide uint8 input block only occupies 2 cache lines and hence
   reading nearly entire extra cache lines of useless data adds to bandwidth wastage.
3. In addition, hardware prefetcher runs past the end of the input block and scale_bias
   cache line, trying to prefetch more useless lines. This effect was characterised in Appendix section of
   https://fb.facebook.com/notes/jason-lu/sparse-adagrad-performance-optimization-in-model-training/10214810437360961/

To get deeper insights into what is going on,
we isolated SparseLengthsSum and SparseLengthsSum8BitsRowwise codes, for float32, float16 and uint8,
into a microbenchmark, where we varried block size, while keeping table size constant (256MB)

block_size  time(uint8) time(float16) time(float32)
64          0.19        0.09          0.17
128         0.12        0.09          0.17
256         0.70        0.09          0.14
1024        0.50        0.06          0.10

The pattern for block size of 64 and 128 is similar to what we observed in sparse_lengths_sum_benchmark.
However, we see that as block_size increases (for a fixed table size),
time to perform embeddings decreases quite drastically. For block_size of 256 and beyond, uint8 starts achieving
speedup over float16. Longer block better amortizes bandwidth wastage due to scale_bias and hardware prefetcher
running past the end of the block.

Reviewed By: kennyhorror

Differential Revision: D5870907

fbshipit-source-id: 445321b96f1b5801ef91f296f6063c35673ee11b
2017-09-21 14:50:43 -07:00
8d19319fa7 Documentation for FusionGroup and Eval requested by @houseroad (#2808)
Plus a test for Eval nodes in the IR, since we hadn't actually
covered this case now that some nodes are transparently traceable.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 17:14:56 -04:00
7750b8db36 Remove NNPACK MaxPool wrapper
Reviewed By: Maratyszcza

Differential Revision: D5879495

fbshipit-source-id: e2020f7e32d64ed9318ab8d09ea63ce6f12a94a3
2017-09-21 12:05:47 -07:00
84182b1853 Partially fix memonger with networkx 2.0
Summary:
This fixes the apparent discrepancy (list vs iterator). After this, there are still 3 failures regarding topological sort but that seems a bit involved. Someone shall look deeper.
Closes https://github.com/caffe2/caffe2/pull/1242

Reviewed By: akyrola

Differential Revision: D5881806

Pulled By: Yangqing

fbshipit-source-id: 5a200010724befde2fa8ce1b61a9c1ba42cad46a
2017-09-21 10:24:41 -07:00
892940b45a fix memory leak in min function 2017-09-21 12:10:28 -04:00
723214e9ac Resolve mismatch between ATen master and pytorch subtree. 2017-09-21 12:10:09 -04:00
f6d3c17fd7 Directly check if the state_dict() has changed, so we fail earlier.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
e1add8fdff [FIXUP] Give a slightly different error if tracing state is expired.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
6125ea7c83 Create a FuncModule for conveniently module-izing functions.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
ea2e7a1f4e [FIXUP] Deduplicate accept_output logic,
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
a01de93fad Give better error message when symbolic() arguments to line up.
Now we actually tell the user what operator was being translated
when there was a failure.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
c083d3ac2e Fix minor bug when --accept'ing commits.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
b805f3f676 Also fix AvgPool2d to follow new convention.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
08148a462c Print name of operator whose symbolic gave wrong number of inputs.
TODO: Robustify this to apply to everything.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
bfed2dce25 AvgPool2d was returning too many outputs, fix it.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
871e3b41e3 Ask for the correct number of derivatives when tracing.
- If you operate with TracingState, you MUST check if it is live.
  Otherwise you will segfault if it is expired; it is VALID for
  tracing states to become expired.

- Tracing states can expire if they request backward tracing
  (which the tracer does by default).  We don't want this to
  happen for exports, which only look at forwards.  So make
  sure we set the correct num_derivatives.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
10ef82f13e Make assertExpected work with Unicode strings in Python 2.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
460a03751b More expect test improvements.
- Print some diagnostic information when accepting new test output.

- If it's the first time you ran an expect test, print out
  the output you got so it's easier to decide if you want
  to accept it.

- Add infrastructure for expect-testing against exceptions
  (I'm going to use this in a later patch).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
f3ae642162 Tighten up the ONNX export interface
- If a user accidentally attempts to export a model that is in training mode, the
  tracer may perturb the parameters (since modules like batchnorm will update
  their parameters.)  To prevent this from happening, we temporarily turn
  off training mode to make sure this doesn't happen.  Temporary is
  important, since model export should not actually affect the model

- If you have a buggy model which is changing the parameters,
  it is much better for us to export the state_dict() *prior*
  to executing the model, because that is what we actually
  used as the inputs to the trace.  The state_dict() afterwards
  could be anything.

- kwargs support never worked, so it's been excised.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-21 12:03:22 -04:00
12ed8ebe5a Revert D5879947: [caffe2][PR] Enable python3 builds
Summary:
This reverts commit c452362b2ab54397723b5be3f1258c57213f6fc4

bypass-lint

Differential Revision: D5879947

fbshipit-source-id: 9dfe4e17ea84c252fa75c103c7a267e1ceddab98
2017-09-20 23:03:19 -07:00
ae10a0a3e8 Enable python3 builds
Summary: Closes https://github.com/caffe2/caffe2/pull/1240

Differential Revision: D5879947

Pulled By: Yangqing

fbshipit-source-id: c452362b2ab54397723b5be3f1258c57213f6fc4
2017-09-20 22:05:23 -07:00
d2d7a0f514 Fix build failure in MSVC 2017-09-21 00:02:04 -04:00
5d6a41b8aa MPSCNNMul(scalar only)
Summary:
Implementation of MPSCNNMul that only supports multiplying a tensor with a scalar value for now.

Benchmark runtime for CPU, OpenGL and MPSCNN:
```
I0919 21:15:17.942468 3068398464 net_simple.cc:103] Main run finished. Milliseconds per iter: 527.795. Iters per second: 1.89467
I0919 21:15:21.043023 3068398464 opengl_test.cc:2293] Main run finished. Milliseconds per iter: 249.766. Iters per second: 4.00374
I0919 21:15:23.182369 3068398464 net_simple.cc:103] Main run finished. Milliseconds per iter: 175.548. Iters per second: 5.69644
```

Reviewed By: hlu1

Differential Revision: D5870100

fbshipit-source-id: 2aadd5d134f3b8b40a41f638040cbef35a0086df
2017-09-20 19:22:01 -07:00
2b9765ad02 Erf and erfinv (#2799) 2017-09-20 21:23:45 -04:00
c3a3d6ceba Add an option to use dynamic memory optimizer.
Reviewed By: akyrola

Differential Revision: D5869664

fbshipit-source-id: ab11bc27395bf10e8381ebf97e6afb83ae9af81f
2017-09-20 12:52:55 -07:00
1b059f4c98 Add option to ignore parameter initialization
Summary: When parameter sharing is used, the model may not own the parameters. Emptying out initializer ensures that the shared model doesn't overwrite initialization.

Reviewed By: chocjy

Differential Revision: D5870362

fbshipit-source-id: f8587b84c3a13f331a3251973e8206563939606a
2017-09-20 12:03:22 -07:00
7d2b2cae19 Remove OFFLINE_TRAINING from global constant
Summary: This is not a very generic constant

Reviewed By: volkhin

Differential Revision: D5870378

fbshipit-source-id: 59509bb48cecb52ba4a3f26b290855374547fe7e
2017-09-20 12:03:21 -07:00
1a83c372ec address issue #1488 by using defaultdict in load_state_dict 2017-09-20 14:56:21 -04:00
ad414908d7 Advanced Indexing with variables for autograd (#2590) 2017-09-20 14:50:07 -04:00
0fff025973 Consistent behavior of max reduction for segment ops and fix test
Summary:
Two implementation of max pool reducers had different semantics in case of equal indices. It matters less in real cases, but breaks tests. Choosing the behavior of LengthMax over SortedSegmentRangeMax as the former is more widely used.

Also some minor tweaks for the test code.

Reviewed By: Yangqing

Differential Revision: D5870386

fbshipit-source-id: 6488cbd5cacaf595ffc07c44084730dd44b3f9dd
2017-09-20 10:59:43 -07:00
2996aad68c remove dead code, add insertAt helper 2017-09-20 12:24:27 -04:00
6e495f5f85 Make output_ a const field in Graph.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
0821856ac9 Add missing is-Param assert
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
6efd797376 Document unchecked invariant.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
25c2b7d8b2 Some minor extra comments on python_function
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
794e52bb1c Make cloneFrom() copy all metadata; use createClone() as much as possible.
To be honest, this was the whole point of this refactor set.

I noticed that in a lot of code, we were repeatedly copying lots of metadata
from old nodes to new nodes.  This was quite concerning because I wanted to
add some more metadata (alias information) and I didn't want to have to
get it right in all cases.  Plus, in a lot of cases we were forgetting
to set more optional properties like debug names when we "copied".

To solve this, I first made cloneFrom() copy all of this metadata.  Then,
I searched for all occurrences of setType() (a proxy for "I'm cloning this
node), looked for cases where we really were morally doing a copy, and rewrote
the code to use cloneFrom() instead, allowing us to drop explicit setType()
(and getting more metadata preservation in the process.)

Finally, I refactored tryToMoveChunk.  The code is modestly longer,
but the new version has the nice property that the initialization of
selects for input_chunk are next to the creation of the node (as opposed
to delayed for later.)  I also added a lot more comments for invariants
I noticed when I was working on the code.

One minor extra change: TensorType grew a new constructor and a withSizesStride
"immutable setter" which returns a new copy of TensorType with different info.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
0b421e590c Move some logic into create().
Previously, there was a hidden, unchecked invariant that you were not allowed to
call create(kParam) or create(kReturn).  Now that the logic for them is embedded
in create(), the create(kParam) case is valid, and the create(kReturn) case
will raise dynamically if you try it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
ba95ffed97 Const correctness in IR and Attribute / linked list excision
Since this code has been stable for a while, I think it's
a good opportunity to make it const correct.  There is only
a slight increase in code size, which I hope will appease @zdevito.

- consts were added to all methods which are logically const.  Most notably,
  lint() is now declared const.

- I made extra const versions of Node::iterator(), Node::reverseIterator(),
  Graph::nodes(), Attribute::find(), linked_list::begin(), linked_list::end(),
  linked_list::rbegin(), linked_list::rend(); in all cases these were one-liners
  except for find() (I spent a little time trying to make find() a one-liner
  but didn't think of a way to do it.).

- graph_node_list got factored out into a new, templated type linked_list<T>
  (perhaps we should call it intrusive_list<T>).  I had to template the iterator
  to define constant and non-constant iterators without duplicating code,
  and once I was there, I decided to templatize everything else.  The code
  nicely factors out, although I wouldn't recommend using it for anything
  else without more refactoring.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
670ec4bc59 Split Type into its own header file.
No other substantive changes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-20 12:24:27 -04:00
06903c3525 bugfix for word language model 2017-09-20 12:24:27 -04:00
5949bb27b5 move write_vis into contrib 2017-09-20 12:24:27 -04:00
a194e66186 allow Concat operators to be the final operator in a fusion group, and update the fusion compiler to support code that includes final concats 2017-09-20 12:24:27 -04:00
27bae83a3a make graph layout more readable 2017-09-20 12:24:27 -04:00
3fb39add23 debugging code to understand fuser 2017-09-20 12:24:27 -04:00
c8993a3e2c Add add_scaled and sub_scaled to TH and THC (#2789)
These functions accept a scaling parameter like THTensor_(cadd)/(csub),
which will make it easier to have the same signature for tensor and
scalar addition in PyTorch and ATen. For example:

  tensor.add(other, alpha=2)

Will work if other is a scalar or a tensor value.

See #2739
2017-09-20 11:39:40 -04:00
16a3de081a Minor rebase fixes 2017-09-20 11:22:57 -04:00
3be774ccb7 Use TH_TENSOR_APPLYx_CONTIG for contiguous tensor to increase the speed. 2017-09-20 11:22:57 -04:00
06fdce04ca Generate ATen from torch/csrc/Declarations.cwrap (#2791)
This adds a concatenated Declarations.cwrap which is the result of
running ATen/extract_cwrap.py on TensorMethods.cwrap. This will let ATen
and the Variable bindings temporarily diverge from Tensor before the new
Variable class subsumes Tensor.

See #2739 and #2633
2017-09-20 09:44:01 -04:00
f4169260f8 Fix crash when calling backwards on leaf variable which does not require grad (#2788) 2017-09-20 09:43:20 -04:00
39434ee2e4 Added LPPool1d. (#2783) 2017-09-20 09:19:29 -04:00
aff1370974 AndroidGLContext can lazily allocate static map
Reviewed By: fricc33

Differential Revision: D5867975

fbshipit-source-id: 0cc9159c27e3f667a001b4cd7768098c36d9550f
2017-09-19 19:06:48 -07:00
871530afdf Mark all (non-static) Type methods as const. 2017-09-19 18:21:42 -07:00
06b7a9e0f6 Backed out changeset 3a5c020294d8
Summary:
Broke
  CAFFE2_HYPOTHESIS_PROFILE=debug buck test //caffe2/caffe2/python:lengths_reducer_rowwise_8bit_ops_test

Reviewed By: kennyhorror

Differential Revision: D5867880

fbshipit-source-id: 80c6f23eccb59b74be4a7258b4f193d79f814c3f
2017-09-19 17:54:18 -07:00
dab5bd23ea fp16: RecurrentNetwork
Summary:
Was https://github.com/caffe2/caffe2/pull/1151
Closes https://github.com/caffe2/caffe2/pull/1192

Reviewed By: salexspb

Differential Revision: D5829775

Pulled By: akyrola

fbshipit-source-id: e0f7609317ca95faf9eb9c81b265d678a24a80e3
2017-09-19 14:47:27 -07:00
ddf6ad83aa Add tiling support to GLConcat
Reviewed By: fricc33

Differential Revision: D5864131

fbshipit-source-id: 63894f5082fbfc64cd078a8f781b4db1b00a69dc
2017-09-19 13:32:12 -07:00
b468ffe6d1 Adding uint8 support for to code generator for and high-performance emebding look-up kernels, supporting
Summary:
Adding uint8  support for to code generator for and high-performance emebding look-up kernels, supporting
Sum, WeightedSum, and Mean reducers. Added number of unit tests to test these operators.

Performance Results
===================

Performance results are below for old code, sparse_lengths_sum_benchmark.old.par, that uses
code in lengths_reducer_rowwise_8bit_ops.h, and our new code, optimized via code generator,
sparse_lengths_sum_benchmark.new.par.  Block size was 128 in all cases.

[root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.old.par --iteration 10000 --dtype uint8
I0912 02:49:58.773259 2640913 net_simple.cc:162] Time per operator type:
I0912 02:49:58.773264 2640913 net_simple.cc:171]         0.75769 SparseLengthsSum8BitsRowwise

[root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype uint8
I0912 02:50:33.981832 2642102 net_simple.cc:162] Time per operator type:
I0912 02:50:33.981837 2642102 net_simple.cc:171]        0.233322 SparseLengthsSum8BitsRowwise

[root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype float16
I0912 02:51:26.748972 2643925 net_simple.cc:162] Time per operator type:
I0912 02:51:26.748977 2643925 net_simple.cc:171]        0.106591 SparseLengthsSum

[root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype float
I0913 01:39:22.372238 1076874 net_simple.cc:162] Time per operator type:
I0913 01:39:22.372244 1076874 net_simple.cc:171]        0.211041 SparseLengthsSum

Analysis
========
Our optimized generated code is ~3.5x faster than original code in lengths_reducer_rowwise_8bit_ops.h
as shown below.

However, our uint8 is about 2x slower than float16 and is on par with float32. There are several reasons for that:
1. uint8 intrudoces extra instructions to multiply by bias and add scaling factors
2. In addition to emebding blocks, we are now also reading scale_bias.
   For every pair of scale and bias, we bring entire cache line of
   64 bytes, whiles only using 8 bytes. 128-wide uint8 input block only occupies 2 cache lines and hence
   reading nearly entire extra cache lines of useless data adds to bandwidth wastage.
3. In addition, hardware prefetcher runs past the end of the input block and scale_bias
   cache line, trying to prefetch more useless lines. This effect was characterised in Appendix section of
   https://fb.facebook.com/notes/jason-lu/sparse-adagrad-performance-optimization-in-model-training/10214810437360961/

To get deeper insights into what is going on,
we isolated SparseLengthsSum and SparseLengthsSum8BitsRowwise codes, for float32, float16 and uint8,
into a microbenchmark, where we varried block size, while keeping table size constant (256MB)

block_size  time(uint8) time(float16) time(float32)
64          0.19        0.09          0.17
128         0.12        0.09          0.17
256         0.70        0.09          0.14
1024        0.50        0.06          0.10

The pattern for block size of 64 and 128 is similar to what we observed in sparse_lengths_sum_benchmark.
However, we see that as block_size increases (for a fixed table size),
time to perform embeddings decreases quite drastically. For block_size of 256 and beyond, uint8 starts achieving
speedup over float16. Longer block better amortizes bandwidth wastage due to scale_bias and hardware prefetcher
running past the end of the block.

Reviewed By: dzhulgakov

Differential Revision: D5824641

fbshipit-source-id: 3a5c020294d84874da78c6943e596423393473d6
2017-09-19 10:50:09 -07:00
2bcad92d12 Fixes for NCCLReduce with non-zero root
Summary:
All other NCCL ops expect paired src, dst pointers for each
GPU. Reduce doesn't, and the old logic would always set dst for
rank = 0 regardless of whether that was the root or not.
This change takes into account that Reduce only has one output, and it
should assign dst only for the root rank. Also changes the schema to
allow inplace for any input and Output(0).
Closes https://github.com/caffe2/caffe2/pull/1214

Differential Revision: D5843177

Pulled By: pietern

fbshipit-source-id: 1e775e6a1ca052e29691b89c1429db03a0e6378b
2017-09-19 10:41:14 -07:00
cc3e6ade42 Fix caffe translator
Summary: att

Reviewed By: bddppq

Differential Revision: D5854100

fbshipit-source-id: bebb0fbe36367f973e93cb09c98ec75758829769
2017-09-19 09:21:14 -07:00
5deacb5bce Enhance comments
* Explain why null edge pruning interferes with SimpleEval
* Explicitly refer to notes using Note sigil
* Copyedit comment for clarity
2017-09-19 10:53:32 -04:00
c536da7064 Remove TensorMeta 2017-09-19 10:53:32 -04:00
a7c4152302 Prune null edges in Eval nodes 2017-09-19 10:53:32 -04:00
b66d90c84f Add a pass to remove all non-standard ONNX nodes before export (#225) 2017-09-19 10:53:32 -04:00
6855d24ff1 Move pybind11 type_caster to different pybind.h in the corresponding folders. (#222) 2017-09-19 10:53:32 -04:00
b7e89d7248 Add support for some ONNX nodes in JIT closure 2017-09-19 10:53:32 -04:00
fe5c644f81 Handle AddConstant in fusion compiler 2017-09-19 10:53:32 -04:00
e05cfb2064 Make sure passes don't mess up stages of nodes and graphs 2017-09-19 10:53:32 -04:00
8a605ce766 Minor refactor of fusion compiler 2017-09-19 10:53:32 -04:00
75497d624e Add JIT_EXPECT (#220)
Add JIT_EXPECT(M) and turn some JIT_ASSERT(M) to JIT_EXPECT(M)
2017-09-19 10:53:32 -04:00
d4fda0bbf8 More updates for Variable ATen
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-19 10:53:32 -04:00
ba6e652c02 Add simple mode to Eval 2017-09-19 10:53:32 -04:00
1f80dd03bd Track change of Variable from shared_ptr to ATen style tensor
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-19 10:53:32 -04:00
aa1a94058b Add AddConstant node to the JIT 2017-09-19 10:53:32 -04:00
7506a3bcb7 Add pybind converters for Symbol and AttributeKind 2017-09-19 10:53:32 -04:00
28828e033f Make certain functions traceable 2017-09-19 10:53:32 -04:00
4d1ed4ec42 Assign traces before saving Variables 2017-09-19 10:53:32 -04:00
af688905e4 Fix a bug in CppOp (missing cloneFrom) 2017-09-19 10:53:32 -04:00
214eef5e5d Record device information in TensorType and check it in the fuser 2017-09-19 10:53:32 -04:00
ab375e19aa size test 2017-09-19 10:53:32 -04:00
83e38d687b Add a comment about what is going on here 2017-09-19 10:53:32 -04:00
dd85947542 fix the fusion test WAR 2017-09-19 10:53:32 -04:00
2ae7d8e5f9 Fix Chunk heuristic in graph fuser 2017-09-19 10:53:32 -04:00
b708b6de8d Add ONNX pass (JIT trace initialization) 2017-09-19 10:53:32 -04:00
0e53fe3a41 Put ONNX files where they belong 2017-09-19 10:53:32 -04:00
8dae433de8 Move JIT passes to a separate directory 2017-09-19 10:53:32 -04:00
2a7b4f5095 Allow TensorMeta to be undefined 2017-09-19 10:53:32 -04:00
6b60f31081 Fix bugs in AutogradClosure 2017-09-19 10:53:32 -04:00
964b731af3 Try to handle NULL Variables in the tracer 2017-09-19 10:53:32 -04:00
aafa35e0b5 Fix bugs in Traceable
Previous refactor introduced a few problems like not saving the
output proto, and it didn't use the flattened inputs when querying
the key.
2017-09-19 10:53:32 -04:00
9c39e8cecb Parity with NumPy newaxis placement in indexing (#2779) 2017-09-19 10:38:18 -04:00
4341dc7e7f avoid variable naming conflict in macro
Summary:
I hit a strange bug and found that the reason is that in the macro, it uses a
temp variable named 'r'. This will cuasing conflict when the macro's own
argument is also expanded as 'r' or related stuff (in my case, it expands to
'r.size()' where here r is a tensor)

Reviewed By: pietern

Differential Revision: D5822833

fbshipit-source-id: 64a6c6b0fc5a1f8359d459d70644bb232ef40606
2017-09-18 23:19:10 -07:00
ad68f623f2 task api, fix comments - a bit cleanup
Summary:
Comments say experimental: don't use it. But these functions are used in the critical path from pipeline.py, so better to remove the comment?

Also changed if-else to first check for None. Although python does not crash with getattr(None, "x"), it is confusing.

Some lint issues.

Reviewed By: azzolini

Differential Revision: D5853639

fbshipit-source-id: 977de5ba0ea3ae26343ae5fcacac883faf892b0e
2017-09-18 21:43:20 -07:00
f8f5e79f5f Backpropagation for If operator
Summary:
Adding backward pass support for If operator:
 - Implemented necessary changes to Do operator and generation of gradient Do operator to properly forward gradient blobs in and out of subnet
 - Using WorkspaceManager to keep track of workspaces used by Do, in case we need to have access to local blobs to compute gradients (also important for loop's backprop)
 - Update to Workspace to handle blob binding from multiple parent workspaces
 - Implemented generation of gradient If operator
 - Unit test to build and train a net with If control op

Reviewed By: azzolini

Differential Revision: D5745096

fbshipit-source-id: 1023c90a2113716254424d1e50b9e560fe9083e5
2017-09-18 16:17:42 -07:00
561fc8d96a remove rotted TODOs 2017-09-18 18:17:20 -04:00
25aea46739 add missing AutoGPU guards 2017-09-18 18:03:03 -04:00
8536079142 missing include 2017-09-18 14:51:23 -07:00
30af9d793d Add broadcasting to bitwise operators. (#2776) 2017-09-18 17:30:02 -04:00
5229a79bf5 Implement THCUNN code for GridSampler (#2737) 2017-09-18 17:29:26 -04:00
888f4d4f61 Update cub to master
Summary:
For future reference - seems that at some point cub had a force push. If any already checked out branch has issues, try deleting the cub submodule and redo git submodule update --init.
Closes https://github.com/caffe2/caffe2/pull/1227

Differential Revision: D5856030

Pulled By: Yangqing

fbshipit-source-id: c192974246c27ce6bd739295c31c25fd75766a35
2017-09-18 14:17:35 -07:00
c6ea6ed8ff Add Nd Padding, Pad1d functions and ConstantPad3d (#2657) 2017-09-18 14:48:49 -04:00
ea8b09365c Specifying the value used for padding (#2751)
* Specifying the value used for padding

The "pad_packed_sequence" function fills padded elements with zeros, but sometimes it is not useful. For example, some previous papers on NLP, including my recent paper [1], use a max-pooling technique for RNN-based sentence representations. More specifically, the max-pooling technique selects the maximum value from all time steps (i.e., hidden states) for each dimension. In such a case, we do not want the padded zeros to be selected. To overcome this situation, we can simply use a very small value instead of zero.

An LSTM example is shown below:

input = embedding(Variable(batchInput))
packedInput = nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first = True)
h, (hn, cn) = self.encoder(packedInput, (h0, c0))
h, _ = nn.utils.rnn.pad_packed_sequence(h, -1024.0 batch_first = True)
sentenceRep, _ = torch.max(h, 1, keepdim = True)

[1] A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. The 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017).
https://arxiv.org/abs/1611.01587 (Equation (4))

* Modified the order of the arguments

Following the suggestion, I modified the order of the arguments.
2017-09-18 14:48:10 -04:00
2763bfc49e Norm subgradient at 0 (#2775) 2017-09-18 12:26:36 -04:00
16ddc863f4 revert more THC Atomics bits from Windows changes 2017-09-18 07:09:01 -07:00
c231ac2253 Add an argument for suppressing download progress
Summary: In some cases (e.g. CI), showing progress bar will mess up the log.

Reviewed By: jerryzh168

Differential Revision: D5850918

fbshipit-source-id: 2da9d020832264cef977391dc2fd8d1e2677d159
2017-09-17 20:15:51 -07:00
ebeaecbfa3 workspace_gpu: Get{CUDAVersion,DeviceProperties}
Summary:
Expose some useful utilities to Python
Closes https://github.com/caffe2/caffe2/pull/1216

Differential Revision: D5843888

Pulled By: akyrola

fbshipit-source-id: fc731781aec3c7cc6a4b7132f1624423d015abff
2017-09-17 20:01:34 -07:00
fb37d35d28 Additional fix for LRN: unitialized variable.
Summary:
It is interesting that under facebook fbcode this was not an issue -
but definitely causing issue on oss.
Closes https://github.com/caffe2/caffe2/pull/1225

Reviewed By: dzhulgakov

Differential Revision: D5851360

Pulled By: Yangqing

fbshipit-source-id: f8a8f15184092a888bdc909ba2323229d4485902
2017-09-17 19:01:23 -07:00
4e26aa4f91 Update nccl master
Summary:
This is needed for cuda 9 builds.
Closes https://github.com/caffe2/caffe2/pull/1226

Reviewed By: dzhulgakov

Differential Revision: D5851354

Pulled By: Yangqing

fbshipit-source-id: 4a23f06b97262fc603b4f0b7b84c3122888e954d
2017-09-17 16:54:07 -07:00
211cb13a7d fix local_response_normalization
Summary:
This fixed a minor bug in D5690181.

Failing test observed in https://travis-ci.org/caffe2/caffe2/jobs/275603846

Reviewed By: jerryzh168

Differential Revision: D5850985

fbshipit-source-id: 02aefb8902878d6adf7686a94153823b92c0e7b7
2017-09-17 14:19:34 -07:00
59b139dabd Fixed compilation on OSX (#2761) 2017-09-17 10:17:59 -04:00
1fc85cde1f serialization fix to preserve backward compatibility and contbuild (#2763) 2017-09-17 10:16:21 -04:00
e397439611 fix THPP CUDA after windows changes 2017-09-16 23:04:54 -07:00
ddd417faf0 Fix non-CUDA builds after Windows PRs (#2760) 2017-09-17 02:02:52 -04:00
2bc1e07b62 THC/THCUNN reverts of incorrect changes after Windows fixes 2017-09-16 22:19:56 -07:00
db7c76128f Merge commit '6643f1b9caefd466441f7c0d18ba06a2a810b7f5' 2017-09-17 00:37:45 -04:00
6643f1b9ca Win64 support for lib/ATen 2017-09-16 21:36:30 -07:00
7951c4a68d Merge commit '5b5218ea9574f93887498a81038352af47fd7fd8' 2017-09-17 00:31:21 -04:00
c7d5ddd23b Improve Windows Compatibility(for lib/THCS) (#2442)
* Win64 support for lib/THCS

* Kill some warnings for MSVC
2017-09-17 00:02:44 -04:00
4ead38f96a Improve Windows Compatibility(for lib/THS) (#2449)
* Win64 support for lib/THS

* Fix VS warnings(for lib/THS)

* Revert changes that prevent sucessful build

* use the type descriptors for int64_t

* Fix warnings in THS for MSVC
2017-09-17 00:02:09 -04:00
5befdd45bd Win64 support for lib/THD (#2444) 2017-09-17 00:01:40 -04:00
268a1f1b96 Improve Windows Compatibility(for lib/THPP) (#2447) 2017-09-17 00:00:08 -04:00
caecbffe62 Improve Windows Compatibility(for lib/THCUNN) (#2443) 2017-09-16 23:58:22 -04:00
0e691f8998 Improve Windows Compatibility(for lib/THNN) (#2446) 2017-09-16 23:55:15 -04:00
1c51c185a1 Improve Windows Compatibility(for lib/THC) (#2440) 2017-09-16 23:50:15 -04:00
61813cfd97 Improve Windows Compatibility(for lib/TH) (#2439)
* Win64 support for lib/TH

* Edit codes to clear warnings(for TH)

* fix format string

* revert modulo changes

* change formats for snprintf
2017-09-16 23:40:58 -04:00
eccfa1041c fix cuda GatherOp for empty batch
Summary: as title

Differential Revision: D5840432

fbshipit-source-id: 5d9021f152c21d24e91dc0cc3d95443782afc228
2017-09-15 17:40:43 -07:00
c3fd31b1a2 weights for labels in image_input_op
Summary: Introduced weight for labels in multi-lable setting. An extra weight blob is introduced and read in the operator in case lable setting is weighted sparse.

Reviewed By: kevinwilfong

Differential Revision: D5812467

fbshipit-source-id: efb209092e1e9effc915b0a753fa0c67b47a4fb6
2017-09-15 17:40:42 -07:00
9639ddd22f Cleanup omnibus-blacklist-hack rules
Summary:
Now that Buck supports a way to opt-out external C/C++ libs from omnibus linking,
this diff removes the hack we previously relied on (and which got copy-pasta-d everywhere).

Reviewed By: pixelb

Differential Revision: D5832450

fbshipit-source-id: cc3d12488f8498be6fb12bce1fedb3ad1accb518
2017-09-15 16:49:35 -07:00
9ec981b866 for CPU-data parallel, allow sharing model
Summary: On CPU, no need to replicate parameters. So try using only one copy (cpu_0) for parameters. Made resnet50_trainer use shared model in cpu mode.

Reviewed By: wesolwsk

Differential Revision: D5812181

fbshipit-source-id: 93254733edbc4a62bd74a629a68f5fa23f7e96ea
2017-09-15 16:19:37 -07:00
132e35bf51 faster sparse lengths weighted sum
Summary: following optimization in sparse lengths sum, translate it into weightedsum

Reviewed By: azzolini

Differential Revision: D5732859

fbshipit-source-id: 430ee077a1063f3c55806f6dbb5ea46f0fd5c486
2017-09-15 15:46:15 -07:00
4c6d177b4f faster SparseLengthsSum kernel
Summary:
following wickedfoo's previous diff, I made SparseLengthsSum kernel a little
faster. I did:
- `__restrict__` note for ptrs
- `ExactBlock` optimization for kernels where post < Maxthreads. This is a general case

===Check Test Area Please, Are we looking at another 57% speed up here???===

Reviewed By: azzolini

Differential Revision: D5676351

fbshipit-source-id: 963f4712106b324fda488ec5c63b7e010b915814
2017-09-15 15:46:14 -07:00
3cc309a2e3 Add Net observer for mobile apps
Reviewed By: salexspb

Differential Revision: D5593850

fbshipit-source-id: 96f7ea6e8a8ad3f92adf4e82239022d9ea2bd50a
2017-09-15 15:38:54 -07:00
6b44a00c71 remove in-place Dropout from rnn_cell (bug in PR-1185)
Summary: This caused gradient generation problems. Output was made in-place in PR-1185, by mistake, I believe.

Differential Revision: D5844825

fbshipit-source-id: 4ad84d0fb468aafde9f78463b9acf89316e633ca
2017-09-15 14:03:33 -07:00
af8f6c1bca adding unit tests to compphoto caffe2 projects
Summary: Ported existing adhoc test code to use python unittests. Small tweak to caffe2.python.hypothesis_test_util

Reviewed By: kmatzen

Differential Revision: D5837295

fbshipit-source-id: daa2360db3c18c7d4bda7785e7a0b9175f5858af
2017-09-15 12:49:37 -07:00
dd27997aeb DOC: adding note about distributed MPI backend (#2750) 2017-09-15 13:47:35 -04:00
27dde63358 Allow run of example resnet50_trainer without training data
Summary:
This is useful for pure throughput tests where
we don't care about training a real model.

Reviewed By: akyrola

Differential Revision: D5834293

fbshipit-source-id: dab528c9269fb713e6f6b42457966219c06e0a35
2017-09-15 09:45:11 -07:00
3a3d27130d Fix symbolic for max pool in all dimensions (#2742) 2017-09-15 10:10:38 -04:00
1a89c6e1ec Decayed adagrad
Summary: When trained on billions of data, the adagrad gradient square sum be very big and create an issue of adding small numbers to big numbers. This diff Allow to decay the adagrad gradient square sum.

Reviewed By: queqichao

Differential Revision: D5825932

fbshipit-source-id: 570224483b77d42ae53410fa2f767af86de167eb
2017-09-15 00:35:21 -07:00
f21de86209 Add per Op execution counts to prof_dag
Summary: Added new counter to prof_dag which counts the number of times a particular op_type executed during an iteration, and prints the count per iter in the output.

Reviewed By: akyrola

Differential Revision: D5837444

fbshipit-source-id: 0f2571c6f85410dac21d4b627fe455ef7c1ab908
2017-09-14 23:04:33 -07:00
fb45383ed6 resubmission of PR1175: fp16 BatchMatMul
Summary: PR 1175 caused a build error because gemmBatched was only under a specific #ifdef. Now put it outside the #ifdef, and things work.

Reviewed By: asaadaldien

Differential Revision: D5834868

fbshipit-source-id: 072a64c8f4b259ff7504104121766115b46b8aa0
2017-09-14 21:46:05 -07:00
0bbf8a7a4c Fix squareFactors in opengl_test.cc
Summary: Remove the caffe2 namespace {} because all the code inside opengl_test.cc is wrapped inside the caffe2 namespace

Reviewed By: Maratyszcza

Differential Revision: D5829458

fbshipit-source-id: e68dde08a1c3dc4c41260f5f028ca7efe8d34fbd
2017-09-14 20:16:55 -07:00
7752fe5d4e remove zero padding in orthogonal initialization 2017-09-14 23:13:43 -04:00
b42a125ee4 Fix NCCL ops + Add NCCLReduceScatter
Summary:
- All NCCL ops that were triggering a reallocation were deadlocking because I think cudaMalloc or something wants the lock that is being held by ncclRun, so I split the parts where potential allocation happens to a separate lambda. Thanks a lot akyrola and asaadaldien for the after-hours help on debugging this.
- Added support for NCCLReduceScatter.
- NCCLReduce is still deadlocking, but it happens somewhere else. We can debug it separately.

Reviewed By: akyrola

Differential Revision: D5800861

fbshipit-source-id: c963f93942a3ee3bb706fac52047b18c3f37831a
2017-09-14 18:47:11 -07:00
3821fca0c6 DOC: i{send, recv} message order with MPI backend 2017-09-14 20:38:11 -04:00
b14c5bf016 Save output_nr in SavedVariable 2017-09-14 20:31:30 -04:00
1e37145872 Resnet50 should param init net before creating test net
Summary: Otherwise weights, biases are not created and test creation fails

Reviewed By: gsethi523

Differential Revision: D5836438

fbshipit-source-id: 32a75313b6b9ebecbfaa43ebd39f19c8eaba8cd1
2017-09-14 16:06:01 -07:00
86a9a06878 HTTPMessage in Python 3 does not have getheader
Summary: get and getheader are the same in Python 2

Reviewed By: akyrola

Differential Revision: D5836486

fbshipit-source-id: 3bacfccc872c44741d7f26c68ba967093fce45c2
2017-09-14 13:59:06 -07:00
6340fde3b9 Made some arguments in momentum_sgd_update const
Summary: Concerns N, momentum and nesterov arguments.

Reviewed By: asaadaldien

Differential Revision: D5787218

fbshipit-source-id: 6a068b49db4bb06674c2aef3efd366ce4d9ac60d
2017-09-14 13:32:16 -07:00
7eb5ad2e26 Fix profdag infinite loop
Summary: Runasync() called DagNetBase::Run() which called ProfDag::RunAsync().

Reviewed By: Yangqing

Differential Revision: D5835852

fbshipit-source-id: 30618d517c7ee235143de6efaa2f40df3f1d372f
2017-09-14 13:20:57 -07:00
632da0b6be LRN Op input "scale"
Summary:
* For forward: allow either 1 or 2 output.
* For gradient generator: always return a gradient operator that does not use scale.
* For cudnn gradient op: nothing to do, already like this
* For default CPU and CUDA gradient ops: put scale as a member variable, and always recompute scale.

Reviewed By: bddppq

Differential Revision: D5690181

fbshipit-source-id: a6353202dcaf7359298bc8f032ac0c651352e2bc
2017-09-14 12:22:07 -07:00
08b4770adf minor spelling, intialize->initialize 2017-09-14 15:13:01 -04:00
06c44e2283 Replace Variable(new VariableImpl(...), false) with make_variable.
Also squash a warning about an implicit conversion that will never
occur (because the type being converted to is a superclass).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-14 14:33:08 -04:00
bcad604ea6 Move imap to six.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-14 14:33:08 -04:00
5b5218ea95 Micro optimizations in ATen
* Compare typeid instead of using dynamic_cast
* Mark derived TensorImpl classes as final
* Use tensor->nDimension instead of THTensor_(nDimension)
2017-09-14 11:14:45 -07:00
0e7bd68536 Allow one output for droput at inference time
Summary: att

Reviewed By: bddppq

Differential Revision: D5680214

fbshipit-source-id: 19e731901cb5c9491100c61baefc4b75e6e8b262
2017-09-14 10:46:41 -07:00
63a2b75027 Add option to remove legacy_pad in caffe_translator
Summary:
To speed up deprecating legacy_pad, we added the option
to remove legacy pad in the caffe_translator

Reviewed By: bddppq

Differential Revision: D5724079

fbshipit-source-id: 25465d26f35bd009aa71667c7c523047de42e802
2017-09-14 10:32:48 -07:00
253d48c815 add in-place random sampling ops 2017-09-14 10:03:17 -04:00
ce4932f8a4 add softmax2d docs 2017-09-14 09:41:04 -04:00
0f0829d88e Strict bound check for SequenceFunctor
Summary:
This exhibits the problem in NMT training where some out of bound data seems to
have silently written over bound, and causing random segfaults elsewhere in the
code. This itself does not solve the problem, but will trigger us to then fix the out
of bound issues.

Differential Revision: D5832646

fbshipit-source-id: 5eb259e4584e5341ef3f19362f98f0a9554e9aec
2017-09-14 01:30:58 -07:00
efda016108 fix dynamic-type-mismatch (ubsan) in caffe2/caffe2/core/tensor.h
Summary:
UBSan report:

```
UndefinedBehaviorSanitizer: dynamic-type-mismatch caffe2/caffe2/core/tensor.h:786:22 in
caffe2/caffe2/core/tensor.h:787:19: runtime error: member call on address 0x60c01f610440 which does not point to an object of type 'caffe2::Tensor<caffe2::Tensor<caffe2::CPUContext> >'
*** Aborted at 1505298367 (Unix time, try 'date -d 1505298367') ***
*** Signal 6 (SIGABRT) (0xf2) received by PID 242 (pthread TID 0x7fb376f06700) (linux TID 33215) (maybe from PID 242, UID 0), stack trace: ***
0x60c01f610440: note: object is of type 'N6caffe26TensorINS_10CPUContextEEE'
 07 5e 81 60  c8 47 13 35 00 00 00 00  90 f3 73 80 20 60 00 00  98 f3 73 80 20 60 00 00  a0 f3 73 80
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'N6caffe26TensorINS_10CPUContextEEE'
    #0 0x1f0d1c22 in std::vector<long, std::allocator<long> > caffe2::GetTensorInfo<caffe2::Tensor<caffe2::CPUContext> >(void const*, bool*, unsigned long*, caffe2::DeviceOption*) caffe2/caffe2/core/tensor.h:787:19
    #1 0x9a5e0a1 in caffe2::FacebookOperatorObserver::log() caffe2/caffe2/fb/init/net_observer.cpp:300:15
    #2 0x9a5b49d in caffe2::FacebookOperatorObserver::Stop() caffe2/caffe2/fb/init/net_observer.cpp:229:11
    #3 0x447d046 in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:308:20
    #4 0x1ecedb2f in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:51:14
    #5 0x1f1ba169 in caffe2::Workspace::RunNet(std::basic_fbstring<char, std::char_traits<char>, std::allocator<char>, std::fbstring_core<char> > const&) caffe2/caffe2/core/workspace.cc:211:26
...
```

The bug is that `GetTensorType` and `GetTensorType` take context as template argument, not tensor itself.

Reviewed By: bddppq

Differential Revision: D5826781

fbshipit-source-id: 9cfd2ca1aaef6f8ee8a556ce7b553c0a4f43a100
2017-09-13 23:31:35 -07:00
e9581e47a2 fix comment on core.Net.RunAllOnMKL
Summary: Fix comment on core.Net.RunAllOnMKL (the comment was actually for core.Net.RunAllOnGPU)

Reviewed By: zem7

Differential Revision: D5734309

fbshipit-source-id: 2cc40a99a2c0083c73ec1e4c8279f55f296a003c
2017-09-13 19:32:18 -07:00
77ea40c01a Added USDT sample points to simple net
Summary:
This enables opsnoop to work with simple net as opposed
to just dag net

Reviewed By: pietern

Differential Revision: D5721732

fbshipit-source-id: c38d0b51d3b0469ecb2883e7075eeee7acf81d75
2017-09-13 19:10:16 -07:00
f0d0361609 Revert D5794634: [caffe2][PR] fp16: BatchMatMul
Summary:
This reverts commit 911c462824edec3de529a5a4385a4c437e24bf59

bypass-lint

Differential Revision: D5794634

fbshipit-source-id: 1863b02282329cbee6b10e5870f03051b4bb6c58
2017-09-13 18:46:47 -07:00
6436881e2d Re-issue random resize
Summary: Closes https://github.com/caffe2/caffe2/pull/1110

Reviewed By: akyrola

Differential Revision: D5661604

Pulled By: harouwu

fbshipit-source-id: de8b7916ffd9b9970db20ad79da77c135e759a4f
2017-09-13 17:47:43 -07:00
80d229b0e7 Refactor THPUtils_invalidArguments into separate file 2017-09-13 19:18:02 -04:00
68f358452b Add node_name to DeviceOption
Summary: Allow for generalizing net transforms.

Reviewed By: Yangqing

Differential Revision: D5812140

fbshipit-source-id: e3f30acad362ae1f0614ee218d331b525710b88e
2017-09-13 16:04:04 -07:00
37af6566e1 fp16: LSTMUnit
Summary:
Was https://github.com/caffe2/caffe2/pull/1151
Closes https://github.com/caffe2/caffe2/pull/1191

Differential Revision: D5825387

Pulled By: akyrola

fbshipit-source-id: edb47c8bd7ffb72e1e587a9c5bfee9347e3d587e
2017-09-13 15:47:03 -07:00
23f4f78c22 Functional C2
Summary:
Supporting calling C2 operators as functions, e.g.
```
from caffe2.python.functional import Functional
Y = Functional.Relu(X)[0]
```
Supporting numpy arrays as input for now.

Reviewed By: bddppq

Differential Revision: D5791821

fbshipit-source-id: 7e936ad52b8b304c5e210248bd6649fd066cd909
2017-09-13 15:37:28 -07:00
e4c0af8b56 revert #2708 modify orthogonal init for rows<cols case 2017-09-13 18:23:43 -04:00
2b5835ba5c fix lint 2017-09-13 18:18:34 -04:00
0a9f93e43c add env var for python executable 2017-09-13 17:49:08 -04:00
7eafd6cd6f Merge commit '23e5a8be8ea42118c4d93632affb00a0802a7770' 2017-09-13 17:38:11 -04:00
ec2ee181c1 allow sharing tensor of simple types
Summary: If blob type switches between fp32, fp16 - for example - we should share the tensor buffer. This kind of switching can happen with memonger and in-place conversions.

Reviewed By: bddppq

Differential Revision: D5812333

fbshipit-source-id: 44d54bfe52cbda734db8c7f20d6970e4b51ee1e1
2017-09-13 14:35:29 -07:00
bd17684252 Run thread pool only on fast cores
Summary:
choose the number of cores for the thread pool as the number of fast cores

Didn't do any benchmarks, so its mostly FYI diff

Reviewed By: ajtulloch

Differential Revision: D5579797

fbshipit-source-id: 5ada001116c731780f38a62e9c0b500bd64a4bfe
2017-09-13 14:35:28 -07:00
90ca470d70 Standardize operator argument "is_test"
Summary:
Also add the ability to mark an argument as required.

Added a string constant `OpSchema::Arg_IsTest` for `is_test` arg.
If users define the `is_test` argument with `ArgIsTest(...)`, then it automatically becomes required argument, in the meanwhile user can still use `Arg("is_test", ...)` to define an optional `is_test` argument.

Reviewed By: akyrola

Differential Revision: D5812391

fbshipit-source-id: eaaba50d027813a8012389edc6c459de23c3c728
2017-09-13 14:35:27 -07:00
3cfc6f26e7 fp16: BatchMatMul
Summary:
Was https://github.com/caffe2/caffe2/pull/1151
Closes https://github.com/caffe2/caffe2/pull/1175

Reviewed By: Yangqing

Differential Revision: D5794634

Pulled By: akyrola

fbshipit-source-id: 911c462824edec3de529a5a4385a4c437e24bf59
2017-09-13 14:35:25 -07:00
97e733615c Use simple function pointers for memory allocation and deallocation.
Reviewed By: Yangqing

Differential Revision: D5822238

fbshipit-source-id: 9624e6494ea6be10221aa75c7f22aa8721946af2
2017-09-13 14:26:04 -07:00
23e5a8be8e add support for custom python 2017-09-13 14:06:56 -07:00
d01adcbe0e modify orthogonal init 2017-09-13 16:54:37 -04:00
4d3a0f7a20 spell fix seet to set
Summary: spell fix seet to set

Reviewed By: zem7

Differential Revision: D5825897

fbshipit-source-id: 17c85450b17b0d857cc69739bfc33c8a0d55b981
2017-09-13 13:21:01 -07:00
462f95ed6d fix bug in autograd type() for non-default GPU input 2017-09-13 15:33:37 -04:00
c07ebd2396 TrimDataset to ensure size is multiple of number or replicas
Summary: For data parallel we need the batch size to be multiple of nubmer of replicas. In order to do so with this diff we do Dataset(rec).trim(multiple_of=num_replicas)

Reviewed By: dzhulgakov, harouwu

Differential Revision: D5753861

fbshipit-source-id: c5d728b925707dbd3d1f500a93e67e185c223569
2017-09-13 12:17:21 -07:00
c313855523 Use brew in rnn_cell.py
Summary:
Was https://github.com/caffe2/caffe2/pull/1151.
Closes https://github.com/caffe2/caffe2/pull/1185

Differential Revision: D5794716

Pulled By: akyrola

fbshipit-source-id: c27d30d5d6dd7dacc47610150dcfef03343a7120
2017-09-13 12:02:57 -07:00
6e322a4191 refactor states-handling of CuDNNDropout
Summary: CuDNNDropout used to append the CUDNN states structure on top of the mask blob. This is a bit controversial, and also caused problems when the mask-blob was released by dynamic memory management. This diff makes that states-blob a separate blob managed outside the inputs/outputs (so that we don't need to have different signature for CUDNN and non-CUDNN op). Since Gradient op needs to access the same states, it will grab the states blob based on the mask blob name. Perhaps not the most cleanest way to pass information, but at least better than the previous model. Also could remove a fair amount of code.

Reviewed By: bddppq

Differential Revision: D5787039

fbshipit-source-id: d95f0ffafb5fb2a6a7ce46f4a855e9c1b9a47f52
2017-09-13 12:02:57 -07:00
2356ee41b7 Fix segfault in backward 2017-09-13 14:47:26 -04:00
361bbb8b43 fp16: SumReduceLike
Summary:
Was https://github.com/caffe2/caffe2/pull/1151
Closes https://github.com/caffe2/caffe2/pull/1183

Differential Revision: D5794704

Pulled By: akyrola

fbshipit-source-id: e4dee46f753e9a8663057c81f23028f6246fba02
2017-09-13 11:46:23 -07:00
f775149205 tests: use assertRaises, not expectedFail
Summary:
I would expect that tests marked "expected failure" mean that there is a known issue in the code which will be fixed later. Both of these tests are simply verifying proper error-checking - nothing needs fixing.

Before (looks like something is wrong):
```
======================================= 2 xfailed in 0.27 seconds =======================================
```
After:
```
======================================= 2 passed in 0.28 seconds ========================================
```
/cc akyrola gsethi523
Closes https://github.com/caffe2/caffe2/pull/1209

Differential Revision: D5825373

Pulled By: akyrola

fbshipit-source-id: 1b98f503e4e406f69567d02425532f43bd16a465
2017-09-13 11:39:35 -07:00
d910a94b2b Support AdaptiveMaxPool1d/2d double backwards. 2017-09-13 12:28:43 -04:00
2cad108269 Make AdaptiveMaxPool1d/2d indices format the same as MaxPool1d/2d format. 2017-09-13 12:28:43 -04:00
4b5a6c07ac Make 's_' functions on Type public 2017-09-13 00:19:40 -07:00
95c954abc0 redesigning NetBase's Run() and RunAsync() functionalities
Summary:
Right now, each net implements 2 functions: Run() and RunAsync(). The (loose) abstraction is:

* Run(): run the network in a synchronous way. The call is synchronous.
* RunAsync(): run the network *still synchronously*, but potentially use asynchronous scheduling of the underlying operators.

As one can see, this is highly confusing: RunAsync() is actually a sync call, and the semantics it tries to implement should actually be done by a different net type. For example, DAGNet and AsyncDAGNet both implement the Run() function, and under the hood one uses sync scheduling and one uses async scheduling. Currently, the only user of the RunAsync() function is in SimpleNet::RunAsync(). The only call site is in recurrent_net_op.

Instead, the operator implements the two Run() and RunAsync() functions as follows:

* Run(): run the operator in a synchronous way. aka doing FinishDeviceComputation().
* RunAsync(): run the operator in an asynchronous way if possible (i.e. still sync in CPU, but async in cuda), records the action in the event_, and return immediately.

Semantically, Run() is equal to RunAsync() followed by event().Finish().

As a result, we propose in diff D5812854 to change the network interface similar to the operator interface, and explicitly raise RunAsync() as a first class citizen of the net interface. Specifically, whether a net can run asynchronously is now determined by the

* Adding a SupportsAsync() function that determines if a net supports async execution or not.
* Run(): run the net in a synchronous way.
* RunAsync(): if SupportsAsync() is false, same as Run(). if SupportsAsync() is true, run the operator in an asynchronous way, with the scheduling algorithm determined by the implementation itself. Then, record all outstanding events in the events_ field, and return immediately.

Semantically, Run() is equal to RunAsync, and call event.Finish() for all the events. This is actually the implementation and Run() is no longer a virtual function, RunAsync() is: all sub classes of NetBase shall implement SupportsAsync() and RunAsync() now.

**Why SupportsAsync()?**

This is a design idea that probably needs iterating. Basically, the idea is that RunAsync() is the main entry for the net execution, and it's actually like RunAsyncIfTheNetSupportsIt().

In theory, Run() is basically a wrapper on top of RunAsync() to reduce code duplication: if a net type does not support RunAsync(), its RunAsync() implementation simply is sync (see e.g. SimpleNet) and the Run() to RunAsync() lowering is a no-op (with the only overhead being a nested function call).

I exposed the SupportsAsync() function just in case some caller wants to explicitly check whether an instantiated net supports async call or not - for example, a caller may want to make sure that it is actually running a net asynchronously, in which case SupportsAsync() is the place to query.

Reviewed By: dzhulgakov

Differential Revision: D5812854

fbshipit-source-id: 916b38fded0eb14439f340ab254a034ac5a9a465
2017-09-13 00:02:20 -07:00
a198da5583 Added LengthMax Operator to Caffe2
Summary: Added LengthMax operator to Caffe2.

Reviewed By: dzhulgakov

Differential Revision: D5720124

fbshipit-source-id: 1995fea8e480c9a9f3e054d02801b03c1ce6c51b
2017-09-12 20:01:48 -07:00
0b89eb7592 Make seg ios run with OpenGL
Summary: Trying to reland D5803411

Reviewed By: fricc33

Differential Revision: D5819829

fbshipit-source-id: 96cb29c7699df625d30853f91844153ed76505d5
2017-09-12 18:16:23 -07:00
63829695c6 Make android segmentation net run with MPSCNN
Summary: Trying to reland D5803245

Reviewed By: fricc33

Differential Revision: D5818735

fbshipit-source-id: 252fd3c68ce8731b5c96e2f0678128ba9b668581
2017-09-12 18:16:22 -07:00
cd9b27231b Add comment about scope-defining trick.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-12 21:08:23 -04:00
713756d115 Remove function test code, cleanup. 2017-09-12 21:07:48 -04:00
36b13f4776 Implement Concat Function tests as individual test methods since there
is no cat method on Tensors/Variables.
2017-09-12 21:07:48 -04:00
3da453f25a Unify function and method tests. 2017-09-12 21:07:48 -04:00
08eb88f3de Duplicate what is tested in function tests in the method tests.
Also make some function-vs-method tests uniform and change method
tests so they will pass gradchecks (i.e. avoid nans)
2017-09-12 21:07:48 -04:00
19cfda761c write THD link libraries to text file and read it in setup.py to link dependencies correctly (#2711) 2017-09-12 20:56:36 -04:00
8860fb7fe0 Implemented uniform buffer batching
Summary: Kernel data and other shader parameters are now cached directly into uniform buffer blocks, and the blocks are dynamically attached at run time.

Reviewed By: hlu1

Differential Revision: D5772847

fbshipit-source-id: 746448c2d5db12e38fb883874ede3acfccb9f6ef
2017-09-12 17:51:39 -07:00
68e7a0f2ed Enable target dialect token in inference.
Differential Revision: D5665714

fbshipit-source-id: 56ba88e72f71cae23d992e3ad7ea134c3d2c6d1d
2017-09-12 17:22:18 -07:00
47fd6cc255 Revert D5801013: [caffe2] Use simple function pointers for memory allocation and deallocation.
Summary:
This reverts commit 7068207a43400fa3902bbb3689b3c729e839456c

bypass-lint

Differential Revision: D5801013

fbshipit-source-id: ca2bd9aaf61c20ce1935a007ab7b34f5d37f5033
2017-09-12 16:36:36 -07:00
c2169c717f Remove references to cnmem
Summary: TSIA

Reviewed By: Yangqing

Differential Revision: D5815624

fbshipit-source-id: 1a6c0e471eac778aeac80001eac947178fc105ed
2017-09-12 14:37:12 -07:00
ce36a972b0 fix timeouts in CloneOrCreateCommonWorld
Summary: Default value for timeout in CreateOrCloneCommonWorld does not work properly: if the value of dpm._DEFAULT_TIMEOUT is changed, the default still stays as old 30s. Changed to use None instead as default.

Reviewed By: pietern

Differential Revision: D5813228

fbshipit-source-id: f617ceec40a03893c27d3e13c426e1ca6b2114e2
2017-09-12 13:09:05 -07:00
583d031754 Operator to compute RoI region coordinates for RMAC
Summary:
Computes a fixed grid or RMAC region coordinates for a given 4D feature tensor
(NCHW) as described in https://arxiv.org/abs/1511.05879. The output is the
`roi` format expected by RoIPoolOp. To compute the actual RMAC itself, the
output of this op should be passed to RoIPoolOp.

Reviewed By: wickedfoo

Differential Revision: D5594994

fbshipit-source-id: 5edac98a18137b53555f9a16354419b424679c99
2017-09-12 12:47:17 -07:00
be406b1e5f Revert D5639080: Caffe2: Cuda implementation for BatchOneHot operator
Summary:
This reverts commit 8ee280c4bab64c1fdfb7429ee2c9ac8c02933931

bypass-lint

Differential Revision: D5639080

fbshipit-source-id: cf522822b7cb5ba9a238ba7837f0f522e1f49b73
2017-09-12 11:51:14 -07:00
93bd3c77f8 AddBlobsSync()
Summary: Explicit function to sync blobs. Notice that this must be called before CreateNet(), and syncs the blobs every run.

Reviewed By: asaadaldien, jay-mahadeokar

Differential Revision: D5805891

fbshipit-source-id: 58a1bb47805d75d5cbead136e2e0e9fe663ea954
2017-09-12 10:33:22 -07:00
1290e586fb Use at::Tensor based autograd Variable (#2676)
Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file.

Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.
2017-09-12 11:36:01 -04:00
820143f4af Drop L specifier; reimplement tuple printing in C++
When you call repr() on a long in Python 2, it prints a long suffix.
This is annoying for tests which assert on the exact output.  Use str()
instead.

But then there is a problem with Python 2's default tuple str() implementation,
where it calls repr() on its arguments rather than str().  This means that
if you have a tuple of longs, it will render as "(1L, 2L)" in Python 2.

To solve this problem, we just reimplement tuple printing in C++.
This is not a very robust fix (nested tuples, dictionaries, all these situations
will fail) but in practice it hits the cases that matter.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-12 11:03:03 -04:00
d1346c75ec Always use generator version of map for Variable iteration.
In Python 2, the non-generator map will always perform the indexing
even when it is not used in the end.  Using the generator can let
us avoid indexing when it is not used.

As an added bonus, it makes the ordering of operations deterministic
between Python 2 and Python 3 in LSTM.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-12 11:03:03 -04:00
39d495b267 Generate expect files in same directory as top-level test script.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-12 11:03:03 -04:00
a782858285 Move go_token_id out of beam search constructor.
Summary: This is will allow the same decoder to handle different go tokens.

Differential Revision: D5801811

fbshipit-source-id: ddd309963c97e32c728b15d2ccd4ba0c4ad5ebbe
2017-09-11 18:52:08 -07:00
d52404779f Revert D5803245: [caffe2][MPSCNN][segmentation] Make android segmentation net run with MPSCNN
Summary:
This reverts commit 6808e9c3504389c113c7a16504d6554e83bdcc3e

bypass-lint

Differential Revision: D5803245

fbshipit-source-id: e6e2e90dd196ae958d729af2e19942e922207a2a
2017-09-11 18:33:53 -07:00
f09fb7735e Revert D5803411: [caffe2][segmentation]Make iOS segmentation net run with OpenGL
Summary:
This reverts commit d208771d59f99b4f95ce67849baf369c14e66b37

bypass-lint

Differential Revision: D5803411

fbshipit-source-id: b120583dca6b885e91c92993ab3cc18f7e2c8a48
2017-09-11 18:33:52 -07:00
4fec5f658b add Bilinear to docs, fix reference 2017-09-11 20:12:27 -04:00
1794e76800 add missing bilinear docs entry 2017-09-11 20:06:44 -04:00
670cbf0350 Remove the files added by PR 1203
Reviewed By: pietern

Differential Revision: D5809970

fbshipit-source-id: 011b635ca9d1c285543b88cb021df5ba8f4b2a5a
2017-09-11 17:02:00 -07:00
98173850b2 Make iOS segmentation net run with OpenGL
Reviewed By: fricc33

Differential Revision: D5803411

fbshipit-source-id: d208771d59f99b4f95ce67849baf369c14e66b37
2017-09-11 16:32:41 -07:00
ebf7784840 Make android segmentation net run with MPSCNN
Summary: The android segmentation net was failing with MPSCNN because the some fused MPSCNNConvRelu ops become in-place after fusion.

Reviewed By: fricc33

Differential Revision: D5803245

fbshipit-source-id: 6808e9c3504389c113c7a16504d6554e83bdcc3e
2017-09-11 16:32:40 -07:00
103977cc8c fix warnings (#2693) 2017-09-11 18:34:27 -04:00
ebe6f8b631 Merge commit '0161ea2ca911acce1cfebab3e9238992dc5ce963' 2017-09-11 17:58:23 -04:00
3ebf4b6173 Merge commit 'bc66c9da86c5652dd271c9711659ccd689253786' 2017-09-11 17:55:45 -04:00
bc66c9da86 fix alignment warning 2017-09-11 17:54:47 -04:00
944115c915 Bugfix for concat frontend
Summary:
When breaking out pooyadavoodi's change to `brew.concat` from https://github.com/caffe2/caffe2/pull/1151 to https://github.com/caffe2/caffe2/pull/1184, I made it throw an error instead of silently changing removing `order`. But `order` is always present because of [this](https://github.com/caffe2/caffe2/blob/v0.8.1/caffe2/python/model_helper.py#L118), so the frontend can never be used to set `axis`. That's bad. This PR changes the behavior back to Pooya's original implementation.
Closes https://github.com/caffe2/caffe2/pull/1202

Reviewed By: akyrola

Differential Revision: D5806488

Pulled By: pietern

fbshipit-source-id: ceaea77469688a66b269b8ed2944f0d3fe873940
2017-09-11 13:02:59 -07:00
84167faf0f Enable use of GPUDirect through argument to Gloo AllreduceOp
Summary:
If the Gloo InfiniBand transport is used, the Gloo algorithms can use
GPUDirect to DMA directly from/to GPU memory. This is done through the
CudaDeviceWorkspace. This change adds a "gpu_direct" option to the
Allreduce operator that makes it use GPUDirect if the transport
supports it.
Closes https://github.com/caffe2/caffe2/pull/1203

Reviewed By: wesolwsk

Differential Revision: D5806366

Pulled By: pietern

fbshipit-source-id: 9e9a78f059f2b5c6e4fbf6574b7db4776a94696c
2017-09-11 13:02:58 -07:00
0161ea2ca9 Mark unsafeGetTH as const 2017-09-11 11:17:16 -07:00
ace1426d50 Move wrap_dim code to Utils function to minimize generated code. 2017-09-11 11:16:52 -07:00
183c2071f9 Generate wrap_dim code on derived type rather than base type.
Either should work, but code feels more natural this way.
2017-09-11 11:16:52 -07:00
39b5031517 Support wrap_dim specifications from cwrap. 2017-09-11 11:16:52 -07:00
4a71ca6c60 Use cast instead of literal as a temporary fix 2017-09-11 10:44:36 -07:00
1cf58bddd6 Fix default constructor argument 2017-09-11 10:44:36 -07:00
d7f79e7d98 Merge commit '92abd54dfdf03c7ad6f9426c91ad55dc49d95d02' 2017-09-11 13:32:05 -04:00
92abd54dfd simplify the code 2017-09-11 13:31:04 -04:00
bc4f233b56 Make use of zeus kv store.
Summary:
Implement atomic add operation for zeus kv store.
All nodes now use zeus as KVStore instead of replying on master hosting a KVServer
Code cleanup.

Reviewed By: andrewwdye

Differential Revision: D5581697

fbshipit-source-id: ba7d99215fb478a30942ff593f13dad65aa48d36
2017-09-11 09:05:00 -07:00
1c414426df Caffe2: Cuda implementation for BatchOneHot operator
Summary: Cuda implementation for BatchOneHot operator.

Reviewed By: lvdmaaten

Differential Revision: D5639080

fbshipit-source-id: 8ee280c4bab64c1fdfb7429ee2c9ac8c02933931
2017-09-11 08:24:44 -07:00
cf2c7ca998 add THPP linkage when building THD (#2687) 2017-09-11 08:53:38 -04:00
01a1cf1e07 small fix for pointer initialization.
Summary: A bit safer, and also suppresses compiler warning.

Reviewed By: bddppq

Differential Revision: D5803080

fbshipit-source-id: d8c782c936a8fdaded4ae209b212378e78606ffb
2017-09-11 01:41:35 -07:00
10a032de67 Use simple function pointers for memory allocation and deallocation.
Summary:
During the team meeting today Dima and Alex mentioned that the current lambda
function causes slowdown in performance when a large number of alloc and
dealloc happen. My observation is that most of the Delete are actually direct
Delete() function pointers, so I gave it a shot to see if we can reduce
the overhead.

RawAllocDealloc is much fast already, and we observe another 5ns reduction
(12.5%). For TensorAllocDealloc of 32x32 tensors, we are observing 57ns saving
(26%). This is measured on Xeon(R) CPU E5-2660.

Also cleaned up the function interfaces of ShareExternalPointer so we have 2
functions only.

Reviewed By: salexspb, dzhulgakov

Differential Revision: D5801013

fbshipit-source-id: 7068207a43400fa3902bbb3689b3c729e839456c
2017-09-10 22:47:26 -07:00
47d1b6846a Add a memory allocation / deallocation overhead bencmark.
Summary: TSIA

Reviewed By: dzhulgakov, salexspb

Differential Revision: D5801003

fbshipit-source-id: 8be1133ae2f75a735072a82ac33b922da75de8d2
2017-09-10 21:39:26 -07:00
4998a14144 Merge commit 'e8dec6e395faf6c4726df145e85ff7f77618668a' 2017-09-10 13:52:36 -04:00
a77aa12759 Merge commit '0df2f1cbd62ab2a7d507bc68d8d43509ca268a0e' 2017-09-10 13:51:53 -04:00
1da87118cc Optimize pow for different exponents and add tests 2017-09-10 13:51:05 -04:00
e8dec6e395 Optimize pow for different exponents and add tests 2017-09-10 13:50:57 -04:00
0df2f1cbd6 Optimize pow for different exponents and add tests 2017-09-10 13:50:50 -04:00
141f8921ac MultiLabelMarginLoss doc fix (#2683) 2017-09-10 13:48:33 -04:00
b31cf0ebd4 Added support for nInputDim parameter in legacy Padding class (#2645)
* Added support for nInputDim parameter in Padding class

* moved nInputDim to the end so as to not break backwards compatibilty

* hasattr to check if nInputDim is actually set

* check if nInputDim is positive before checking against input dim
2017-09-10 13:47:34 -04:00
96cc52cde7 image_input_op_support_int64
Summary:
Support int64 data type in protobuffer tensor in image input op.
This is useful when fbid, which is usually of data type BIGINT, is stored in tensor proto.

Reviewed By: panshen1

Differential Revision: D5792697

fbshipit-source-id: 0bc3da4fd31120b0582fb32dd7c2d09fe591a6de
2017-09-09 22:50:37 -07:00
5d9c505e41 elu gradient cuda fix
Summary: CPU gradient is correct.  CUDA gradient was wrong.

Reviewed By: asaadaldien

Differential Revision: D5801595

fbshipit-source-id: 7e529ed751b92137e49a0517120ddfae7a30ec28
2017-09-09 21:46:45 -07:00
72ea242280 fix race condition with finished_timesteps
Summary: Stress tests for recurrent_net_executor_test failed sporadically when the executor got stuck in forward-only mode. In forward-only mode we apply limitation to the number of parallel timesteps (because we recycle workspaces cyclically). There was a race condition where the finished_timesteps_ variable was set to 0 after jobs had been executed by threads. So set the variable to 0 before putting any jobs to the queue.

Reviewed By: azzolini, Yangqing

Differential Revision: D5801599

fbshipit-source-id: 8443c67f4ae8af3ae08c6f0cd4575ef729ffa3af
2017-09-09 16:46:15 -07:00
45f07238f4 make rnn executor figure out recurrent mappings from links
Summary: RNN executor previously relied on getting the mapping from x to x_prev (and gradients) from recurrent.py, but we can just infer them from links. This makes all models compatible with rnn executor, given enable_rnn_executor=1 argument.

Reviewed By: jamesr66a

Differential Revision: D5801436

fbshipit-source-id: 14d0e26dfbad6347f645d907da493187c98e9b17
2017-09-09 16:19:26 -07:00
1cf94854a4 fp16: SequenceMask
Summary:
Was https://github.com/caffe2/caffe2/pull/1151
Closes https://github.com/caffe2/caffe2/pull/1178

Reviewed By: bddppq

Differential Revision: D5794641

Pulled By: akyrola

fbshipit-source-id: c3bd99dde74317280a65af7cc7a36a6a734822f6
2017-09-09 13:02:38 -07:00
977b1f988c Fix EmbeddingBag doc (#2679) 2017-09-09 00:05:12 -04:00
d81d71f24c fix docs for variable.backward (#2678) 2017-09-08 20:23:34 -04:00
d43ab4bec5 Create Gloo common world through MPI rendezvous
Summary:
Before this change there were two ways for machines to rendezvous for a
distributed run: shared file system or Redis. If you're using an MPI
cluster it is much more convenient to simply execute mpirun and expect
the "right thing (tm)" to happen. This change adds the "mpi_rendezvous"
option to the CreateCommonWorld operator. If this is set, the common
world size and rank will be pulled from the MPI context and Gloo
rendezvous takes place using MPI. Note that this does NOT mean the MPI
BTL is used; MPI is only used for rendezvous.
Closes https://github.com/caffe2/caffe2/pull/1190

Reviewed By: akyrola

Differential Revision: D5796060

Pulled By: pietern

fbshipit-source-id: f8276908d3f3afef2ac88594ad377e38c17d0226
2017-09-08 17:18:47 -07:00
6cf172c60d fp16: SumSqrElements
Summary:
Was https://github.com/caffe2/caffe2/pull/1151
Closes https://github.com/caffe2/caffe2/pull/1179

Differential Revision: D5794650

Pulled By: akyrola

fbshipit-source-id: 63e7973a88193a3b74ac4ba677df737889cbf0b6
2017-09-08 16:36:51 -07:00
cef2068eee enable setting rnn executor threads and max streams
Summary: As title. Made the configurations op-specific since many models run multiple RNNs.

Reviewed By: jamesr66a

Differential Revision: D5796208

fbshipit-source-id: 88173879dfff9f3f7bf583ccc4f4c6385cca5aca
2017-09-08 16:36:51 -07:00
27433e978c Make piper of PipedReaderBuilder takes arguments
Summary: Allow context to be passed into piper function

Reviewed By: volkhin

Differential Revision: D5684716

fbshipit-source-id: 693f0464fe28f8692d75901705a85a0a413a7bed
2017-09-08 13:46:29 -07:00
c11755e559 Add checks for input texture slice for tiling
Summary: The convolution should not run with input texture slices > 1 with tiling

Differential Revision: D5774187

fbshipit-source-id: 5e94f82cd65e0d4425a7a0090a61a33bef2a14fc
2017-09-08 12:52:22 -07:00
cd7d96e3b6 Fix travis build system by adding sudo
Summary:
This should fix the pip installation errors.

Here's the build on my branch: https://travis-ci.org/bwasti/caffe2/builds/273382203
Closes https://github.com/caffe2/caffe2/pull/1189

Differential Revision: D5795633

Pulled By: bwasti

fbshipit-source-id: a4c341140d19b1885772f79bf321e9febf7986bc
2017-09-08 12:21:22 -07:00
c9f11bc317 fp16: Scale
Summary:
Was https://github.com/caffe2/caffe2/pull/1151
Closes https://github.com/caffe2/caffe2/pull/1180

Differential Revision: D5794682

Pulled By: akyrola

fbshipit-source-id: 29bfa8ebcd3e6d65086abd50c472bd87d2ed0550
2017-09-08 12:21:21 -07:00
6763c14e84 add base class ModifierContext, rewrite OptimizerContext, add RegularizerContext
Summary:
`ModifierContext` is the base class for `OptimizerContext` and `RegularizationContext`.
`UseModifierBase` is the base class for `UseRegularizer `and `UseOptimizer`

Most of codes in `OptimizerContext`, `RegularizationContext` and other potential Context class in future could be shared. We thus implemented a new base class, called `ModifierContext` to support it.

It happens to be the same for `UseRegularizer` and `UseOptimizer`, and we implemented a new base  class called `UseModifierBase`.

In this way, users only need to provide API for **get** and **has** operation. Also, they need to tell what's the **context class**.

**Note**
Mirrored code in fbandroid and fbobj would be added when finally check in.

Reviewed By: kittipatv, xianjiec

Differential Revision: D5724613

fbshipit-source-id: de19bb822dcd41ec5c459d65065603a0abe2fd20
2017-09-08 11:39:23 -07:00
e76015040a add regulariztion in caffe2 and dper
Summary:
Regularization added for caffe2 and dper.

This regularization is intended for `dense feature `only. Sparse feature would serve as individual optimizer, see ` D5618405 ` and  `D5534579` for details.

The implementation of dense regularization is similar to the ones in optimizer. we now support `l1 norm` and  ` l2 norm` in regularizer. In dper, we would call different regularization based on regularization type defined in model_definition.thrift.

Reviewed By: xianjiec

Differential Revision: D5724851

fbshipit-source-id: 0fbee698cfeff1ac477fc9d07785406069f8d9c8
2017-09-08 11:39:22 -07:00
b8eb8ced7d Add transport/interface arguments to CreateCommonWorld operator
Summary:
These arguments control which Gloo transport (TCP or IB) and which
network interface is used for the common world. If not specified, it
defaults to using TCP and the network interface for the IP that the
machine's hostname resolves to.

The valid values for the transport argument are "tcp" and "ibverbs".
For ibverbs to work, Gloo must have been compiled with ibverbs
support. If Gloo is built as part of Caffe2 (sourced from the
third_party directory), then you can pass -DUSE_IBVERBS=ON to CMake to
enable ibverbs support in Gloo.
Closes https://github.com/caffe2/caffe2/pull/1177

Reviewed By: akyrola

Differential Revision: D5789729

Pulled By: pietern

fbshipit-source-id: 0dea1a115c729e54c5c1f9fdd5fb29c14a834a82
2017-09-08 10:57:41 -07:00
3f899a15ce force NO_CUDA to be specified to disable cuda. add pytorch's FindCUDA so that it is possible to get ccache to work for nvcc. make excluded notification more concise. 2017-09-08 10:39:08 -07:00
fdbfcfc431 fp16: CuDNNSoftmax
Summary:
Was https://github.com/caffe2/caffe2/pull/1151
Closes https://github.com/caffe2/caffe2/pull/1181

Differential Revision: D5794693

Pulled By: akyrola

fbshipit-source-id: c83a98968bb363f612e53a04e8236582be6edd5d
2017-09-08 10:34:35 -07:00
03de05229e brew.concat: don't set both order and axis
Summary:
Was https://github.com/caffe2/caffe2/pull/1151.

pooyadavoodi says this was causing problems for him. I don't remember the details.
Closes https://github.com/caffe2/caffe2/pull/1184

Differential Revision: D5794711

Pulled By: akyrola

fbshipit-source-id: 4d75f2a9b30881ba662141c352ac556cb5d3cce6
2017-09-08 10:34:34 -07:00
1a2b229d47 fp16: add test for FC
Summary:
fp16 and TensorCore support was already added to the op in https://github.com/caffe2/caffe2/pull/1056. This adds a test.
Closes https://github.com/caffe2/caffe2/pull/1182

Differential Revision: D5794698

Pulled By: akyrola

fbshipit-source-id: b0d7ef317dfbb9d712b0b4646b38dc600b8434f1
2017-09-08 10:34:34 -07:00
d43185612a Specify CWRAP_FILES_BASE for ATen 2017-09-08 09:43:49 -07:00
046e9ae5c8 Use arg['default'] as constant value 2017-09-08 09:41:26 -07:00
e6fdbd5807 Merge commit '591e3efb6b51ed38e81b3f24bd4a529e21d60f0a' 2017-09-08 09:18:55 -07:00
591e3efb6b Merge pull request #54 from colesbury/default_args
Handle default arguments in base Type class
2017-09-08 11:15:29 -04:00
5d85a6753a Caffe2 BlackBoxPredictor
Reviewed By: dzhulgakov

Differential Revision: D5720775

fbshipit-source-id: e3b37ce5a4aa63807825937ec5f9c0aea76c2aba
2017-09-07 23:16:02 -07:00
9aed89ac88 Allow specification of num_workers in PredictorExportMeta and enable for NMT beam search model
Summary:
The predictor export functions allowed a way to specify a net type, but no way to specify num_workers for when you use net type 'dag'. This adds that option to the PredictorExportMeta named tuple and populates the field in the exported protobuf. Also added parameters to callsites in NMT ensemble model class and model repackager to populate net_type and num_workers.

Using DAGNet for our base predictor net (not recurrent stepnets) speeds up our inference by 1.15x, since we can now run encoder forward and backward RecurrentNet's for each model in the ensemble in parallel.

Reviewed By: salexspb

Differential Revision: D5792203

fbshipit-source-id: cb9a8237a0cbe1a09645d4de051dfbb23f06dcfa
2017-09-07 22:48:45 -07:00
519d5acd4d fix bug in dependency inference for RNNExecutor
Summary: RNN executor did not consider race condition -type of dependency where an op A reads blob X and following op writes blob X. This happened in beam search with a inplace-reshape following FC op.

Reviewed By: jamesr66a

Differential Revision: D5792018

fbshipit-source-id: a5590d80e1b7b127abcdf2b1c2854ea56018e12f
2017-09-07 21:42:14 -07:00
6a883d1bc0 Remove dot_product layer
Summary: This dot_product layer was added before functional layer was added. Now we have functional layer, this dot_product layer is no longer needed. This diff removes dot_product layer.

Reviewed By: kittipatv

Differential Revision: D5783303

fbshipit-source-id: 5d13f729918148ee57836fb47c48e6f24773654b
2017-09-07 18:48:30 -07:00
c087a60026 The CMakeLists.txt name is wrong
Summary: Fix the CMakeLists.txt file name

Reviewed By: Yangqing

Differential Revision: D5790555

fbshipit-source-id: 7c5cc36e6154a2708dc290a336da2204a387c416
2017-09-07 18:16:57 -07:00
ec713d437d make sure the output of sparse lookup layer is float
Summary: currently, if reduer=Nonoe, the output if fp16

Differential Revision: D5773560

fbshipit-source-id: 24d7e5fae366d70352582e9a1ee14c7613753b7a
2017-09-07 17:47:39 -07:00
b6c9ecac7c Fix shape inference of distance_op
Summary: The shape inference of distance_op has issues (only works when inputs are 1D tensors). This diff fix the shape inference and the unit test.

Reviewed By: kittipatv

Differential Revision: D5788744

fbshipit-source-id: cb1b7facf7b9ccd64b54edca156325eceef50f33
2017-09-07 17:16:46 -07:00
176f8f9a19 Make ConvTranspose allow optional bias term
Reviewed By: jerryzh168

Differential Revision: D5755702

fbshipit-source-id: a00487ca376d09b68132162c53797f5af052d114
2017-09-07 17:16:43 -07:00
eec2a0d905 add documentation on top_k option in accuracy op
Summary: top_k argument was missing in accuracy op documentation

Reviewed By: zem7

Differential Revision: D5758807

fbshipit-source-id: ca8a0c172d0a5eafb825a0b134529294edc0b8b4
2017-09-07 15:17:17 -07:00
aeec8ae2ae label_type option was duplicated in image_input_op
Summary: Duplicated label_type removed

Reviewed By: zem7

Differential Revision: D5753003

fbshipit-source-id: 7f2917ec201ecd859e9462622ddce637b84a3da7
2017-09-07 15:06:43 -07:00
0f1a61cf80 @allow-large-files [Caffe2] [Folded diff] Move mobile files to mobile directory
Reviewed By: Yangqing

Differential Revision: D5752229

fbshipit-source-id: bc6e3ec3e4b06ae4b09f94b141a106420664d9ea
2017-09-07 15:06:43 -07:00
381a45a541 Fix BAD_PARAM errors
Summary: Closes https://github.com/caffe2/caffe2/pull/1117

Reviewed By: Yangqing

Differential Revision: D5727512

Pulled By: pietern

fbshipit-source-id: 540faafecb50e5815793991f2a443e9be7e5d353
2017-09-07 14:05:58 -07:00
b2f0ee5d46 Handle scalars that are not backed by tensors 2017-09-07 12:40:31 -07:00
f75cf375da Add accessor to underlying Tensor 2017-09-07 12:40:31 -07:00
32635f1292 zero_dim_to_one and empty_to_null can't both be specified 2017-09-07 12:29:44 -07:00
70f7cfedea Rename 'canonical' to 'has_full_argument_list' 2017-09-07 12:11:42 -07:00
98b7c882c0 add float-divide-by-zero suppressions
Reviewed By: meyering

Differential Revision: D5769768

fbshipit-source-id: 8835e1a605f64a02aaeac07cf2bb3c1dcf6aba00
2017-09-07 11:35:56 -07:00
81066a5e30 Include non-canonical functions in Declarations.yaml 2017-09-07 11:34:17 -07:00
cd59b56440 tell which argument name is duplicate
Summary: We could be a bit more helpful.

Reviewed By: jamesr66a

Differential Revision: D5778789

fbshipit-source-id: 570095196b07d593cfed8318477b296e47c5d43d
2017-09-07 11:19:20 -07:00
7bbfa1dd76 Make Scalar default constructible 2017-09-07 11:06:31 -07:00
e341bc3bea Merge pull request #55 from colesbury/cwrap_files_base
Use CWRAP_FILES_BASE if defined
2017-09-07 13:34:58 -04:00
459cc5a346 Check for nanopb and pybind11 submodules as well. (#2660)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-07 13:24:31 -04:00
b2534a4f60 Merge commit '8176a558277aeec831f2f1a846cb4856a58fb941' 2017-09-07 13:23:22 -04:00
8176a55827 Adjust error message for View
When the size given is incorrect for the number of elements, the current error message is:
`size '[1 x 1 x 5]' is invalid for input of with 1 elements at /pytorch/torch/lib/TH/THStorage.c:41`

This replaces it by
`size '[1 x 1 x 5]' is invalid for input with 1 elements at /pytorch/torch/lib/TH/THStorage.c:41`
which is grammatically better
2017-09-07 13:22:20 -04:00
8e4a889c8f Add onnx to the documentation index. 2017-09-07 09:43:37 -07:00
e8e1c61409 Merge pull request #51 from colesbury/const
Add missing const qualifiers
2017-09-07 12:10:00 -04:00
84095f9512 add linux guard 2017-09-07 11:57:49 -04:00
ab3e95315d Merge commit '3024ff5705faccc2908660582c895371fd133603' 2017-09-07 11:56:38 -04:00
608327b156 Merge commit '5ef96aadd9287ef1f0c10d0469097fd9439efcd7' 2017-09-07 11:55:26 -04:00
eea54cc065 Merge commit 'b6648fe311889cef29f34734d92caee7f5d54db2' 2017-09-07 11:55:00 -04:00
894c05fd22 fix static linkage and make THD statically linked 2017-09-07 11:54:18 -04:00
a3ae136c25 Temporarily suppress buggy test case with relaxed test. (#2663)
Proper broadcasting in ATen uncovered a bug in our fusion
compiler where it outputs the wrong shaped tensor.  We're
tracking the issue in https://github.com/ezyang/pytorch/issues/206
but for now, rewrite the code so it does an "old style" comparison,
which works fine.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-07 11:50:17 -04:00
9cdef6c33b Update for latest ToffeeIR changes. (#2662)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-07 11:47:54 -04:00
4a952e7112 Python 3 fix: OrderedDict values is not a list. (#2661)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-07 11:47:39 -04:00
7838840084 Detailed install instructions for ONNX. (#2654)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-07 08:48:37 -04:00
b7997a0f41 support device ids>10
Summary: Data parallel model failed with device numbers 10, 11.. because it used string sorting of the blob names. Changed to make sorting happen based on device number and then blob name. Also added reduction for 16 devices.

Reviewed By: wesolwsk

Differential Revision: D5781521

fbshipit-source-id: 16be0984ecb55340604c82893be366c0528e822c
2017-09-07 00:01:33 -07:00
3024ff5705 fix static linkage and make THD statically linked 2017-09-06 23:42:03 -07:00
5ef96aadd9 fix static linkage and make THD statically linked 2017-09-06 23:41:45 -07:00
b6648fe311 fix static linkage and make THD statically linked 2017-09-06 23:41:16 -07:00
8190096fec Handle default arguments in base Type class 2017-09-06 20:22:57 -07:00
4e7f171ed5 Use CWRAP_FILES_BASE if defined 2017-09-06 20:18:18 -07:00
fbb8f13499 Docs now finally run with ToffeeIR master.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-06 21:35:50 -04:00
a2e5224847 Fix autograd tests 2017-09-06 21:35:50 -04:00
5e144a8938 Volatile input keys should also consider non-Variable arguments
Additionally, check Variable argument sizes
2017-09-06 21:35:50 -04:00
a897f5a6ee Expose requires_grad for cpp functions 2017-09-06 21:35:50 -04:00
d90cd88fb7 Improve next_functions hanling in tracer and JIT closure
Added extra logic that records edges of previous stages and allows
JIT closures to copy next_functions for next stages.
2017-09-06 21:35:50 -04:00
3b1dfcb51c Add trace flag checking in backward passes too 2017-09-06 21:35:50 -04:00
ea888c1905 Check input flags in Traceable 2017-09-06 21:35:50 -04:00
230721e198 Support calling traced functions multiple times in forward
* Variables now hold a list of ValueTracingStates and can participate
in multiple traces.

* Refactored Traceable to maintain a list of traces, and only stop
tracing once it records all stages
2017-09-06 21:35:50 -04:00
fdbef1cfb0 Traces can now expire 2017-09-06 21:35:50 -04:00
eb11cab272 Misc doc improvements.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-06 21:35:50 -04:00
7ea9de051e Code review comments.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-06 21:35:50 -04:00
360cd9ca58 speed benchmark fix
Summary: Closes https://github.com/caffe2/caffe2/pull/1171

Reviewed By: prigoyal

Differential Revision: D5780191

Pulled By: Yangqing

fbshipit-source-id: 5e8f983f2bcd308be247c60d20674c8fed101561
2017-09-06 16:46:52 -07:00
3251c60804 TensorInferenceFunction for Unique
Summary: Filling in the gap in tensor inference

Reviewed By: sunnieshang, akyrola

Differential Revision: D5779550

fbshipit-source-id: 9ec68c9dad566183d7d0fc2819829c2b91430dda
2017-09-06 15:37:11 -07:00
30da84fbe1 Make Gloo depend on Caffe2 NCCL build
Summary:
If Caffe2 used the packaged NCCL version then the Gloo build will try
to use it as well. To make sure the NCCL build has completed we need
to add an explicit dependency between the two.

Another subtle change here is that we add the PROJECT_BINARY_DIR to
the include path, since that is where the generated <gloo/config.h>
resides. Without this path Caffe2 includes the empty config.h from the
source tree.
Closes https://github.com/caffe2/caffe2/pull/1170

Differential Revision: D5779002

Pulled By: pietern

fbshipit-source-id: 9bc0d41f01a9b0f023d71bc4dee128a77eec1712
2017-09-06 15:37:10 -07:00
a4a44a7cf3 Add missing const qualifiers 2017-09-06 14:53:02 -07:00
ceb13bf3fb Fix cell/hidden init issue, add copy states to test
Summary: As title. Wonder this had not been encountered before. Only affects cases where the states are copied over though.

Reviewed By: Yangqing

Differential Revision: D5777314

fbshipit-source-id: 8aef435c832e4ead5bb3d3e35bb065c734a2af5f
2017-09-06 14:16:17 -07:00
d4336edb05 Disabled test for equivalency between Caffe2's and Numpy's YellowFin
Summary: According to GitHub issue #1168, YellowFin's accuracy between Caffe2 and Numpy models from tests are not good enough in some environments. Results were very close on my machine. GitHub's Travis failed on some tests which I later disabled. Therefore the difference doesn't come from logical differences but from loss of precision on some machines. It is safe to disable equivalency test if equivalency was already once tested.

Reviewed By: akyrola

Differential Revision: D5777049

fbshipit-source-id: c249a205d94b52c3928c37481f15227d500aafd0
2017-09-06 13:47:45 -07:00
6d5c3eaeb7 Add CloneCommonWorld op
Summary:
Cloning was previously done by overloading CreateCommonWorld op.
Closes https://github.com/caffe2/caffe2/pull/1159

Reviewed By: andrewwdye

Differential Revision: D5757580

Pulled By: pietern

fbshipit-source-id: 9e80b295e390bf92623bafb72be21cbafdcf2ff4
2017-09-06 13:32:30 -07:00
91b24b19de Add type inference for EnsureDense and Normalize operator
Summary:
Add type inference for EnsureDense operator so that the output tensor
has the same data_type and shape of the input tensor

Reviewed By: kittipatv

Differential Revision: D5763117

fbshipit-source-id: e507e8d928c1515bd01063e2af595eb0daf1e768
2017-09-06 12:52:36 -07:00
631971e459 threaded RNN executor for CPU, multi-stream executor CUDA
Summary:
Special executor for RNNs which can exploit parallelism over timesteps. For CPU we use multi-threading, achiving 3x or so improved on 4-layers LSTMs.
With CUDA, perf improvements are more modest, but the structure allows for optimizing it further. For CUDA, we use multiple streams and events if there is parallellism
over timesteps. In my experiments, it was not good to use more than 2 streams, though.

Flag --caffe2_rnn_executor can be used to switch the executor off.

Reviewed By: salexspb

Differential Revision: D5749304

fbshipit-source-id: d6f76b3e16598be5b4e8188aff031671ebafaa4c
2017-09-06 12:26:30 -07:00
ff38bbfe2c Enable mpscnn only for 10.2 and above
Summary: att

Reviewed By: ajtulloch

Differential Revision: D5773504

fbshipit-source-id: 452971ed295380193321b05458799dbd93f7ee52
2017-09-06 11:02:25 -07:00
9da95d9b07 bump to renamed onnx repo 2017-09-06 13:45:39 -04:00
3c61b59fd4 codemod primspec -> symbol, PrimSpec -> Symbolic 2017-09-06 13:45:39 -04:00
af649c19a2 ONNXIR -> to ONNX 2017-09-06 13:45:39 -04:00
bafe55bce4 use toffee import until ToffeeIR repo is renamed 2017-09-06 13:45:39 -04:00
6d8d5bab4c Codemod Toffee -> ONNX, toffee -> onnx. Change file names to match 2017-09-06 13:45:39 -04:00
c42ca96714 Stop returning tensors from torch.onnx.export()
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-06 13:45:39 -04:00
fc5137cf73 Merge pull request #49 from gchanan/broadcast
Support broadcast specifications from cwrap.
2017-09-06 13:42:22 -04:00
c7684e3b27 Rowwise quantization
Reviewed By: kennyhorror

Differential Revision: D5753626

fbshipit-source-id: 680c627a81658bcd653feab68e7040db0cb7a185
2017-09-06 10:19:38 -07:00
e4718430e8 Fix typo. 2017-09-06 09:12:45 -07:00
e3d6c2a942 Add proper error message for specifying dimension on a tensor with no dimensions. 2017-09-06 12:09:16 -04:00
22ea8d44e2 Remove unnecessary early conversion to IntList and make expand functions inline. 2017-09-06 08:33:38 -07:00
5419c5ffbc Set default values for concat_split_op
Summary: att

Reviewed By: bddppq

Differential Revision: D5768251

fbshipit-source-id: 7f74b5c2826012619047b61d7a7d1588f1b8d0a6
2017-09-05 17:02:22 -07:00
6ad54d55c9 QConv impl (re-up)
Summary:
Re-revert of D5607549, won't
break build (+ won't increase binary size).

Reviewed By: Yangqing

Differential Revision: D5769757

fbshipit-source-id: 725295f05355350774c5c5a10c8d2c90dd8b7994
2017-09-05 15:18:31 -07:00
4fc54af010 Code review comments.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
f1e4de9a63 Add primspec for Sub, Index, Chunk, and Embedding 2017-09-05 17:48:55 -04:00
29b4ebbf47 test_toffee updates.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
c2f19f5d72 ToffeeIR update.
- kernels -> kernel_shape
- Use the new hybrid dict/tuple result object from Toffee
- Write g and t as singulars, not plural
- nanopb generated files update
- Bugfix for msg() micropb helper
- Start recording producer_version/producer_tag
- Use ir_version from proto description
- Value -> value (Constant)
- Remove special-casing for transposed convolution; we now rely
  on the Caffe2 Toffee backend to do something reasonable
- Batchnorm order is no more

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
a63d88c95b print more detailed error message when trying to exported an unsupported operator 2017-09-05 17:48:55 -04:00
331521cdfd Step 1: Trace and proto collected for SRResNet model (#183) 2017-09-05 17:48:55 -04:00
1b792d3e57 Doc updates.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
cb5fbe1944 Expunge %2.0 syntax.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
394ff072eb Update to latest ToffeeIR operator schema.
- Conv no longer supports bias, so we create an explicit broadcasted
  addition afterwards.  There is one minor problem, however, which is that
  ConvTranspose in Caffe2 has mandatory bias.  So there's a hack.
  See Note [Caffe2ConvTranspose] for the details.
- Squeeze: dims -> axes
- Transpose: axes -> perm
- Reshape lost its extra output (yay!)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
9a05b8dd62 Update to latest ToffeeIR protobuf.
This was a doozy!

- 'namespace' is a C++ reserved keyword, so if you have a field named
  this, nanopb will blithely export some malformed C++.  I submitted
  a PR for this: https://github.com/ProjectToffee/ToffeeIR/pull/88

- Zach added support for singular tensor and graph.  While attempting
  to add support for these, I realized that it was actually impossible
  to support them under the default protobuf translation.  The gory
  details are in Note [Callback for nested messages].  The singular
  callbacks needed a new helper which I dubbed msg; it's just
  the singular version of list.

- While I was working on the API, I braino'd with the tensor()
  method.  It turns out this is totally not the right way to think
  about it; it's more string_from_tensor().  So I renamed it.
  I also renamed add_tensor to set_raw_data; add_tensor is a misnomer
  since it implies you can add multiple tensors, which is not true.

- version turned into producer_version.  Actually, this is a bit
  questionable and might change soon.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
99d6b9b923 make API debuggable 2017-09-05 17:48:55 -04:00
52e693022a helper methods appendNewNode and NewNode for python Graph API
uses suffixes to disambiguate attribute types
2017-09-05 17:48:55 -04:00
5c82aefa24 Fix bug in Transpose export.
This is a case of two wrongs make a right.  There were a pair of
related bugs;

- We incorrectly translated Transpose as if it were a Permute;
  but Torch transpose actually is a *swap* between dimensions.

- Why didn't we ever notice it?  In all of our tests, a transpose
  was *solely* done to get a weight matrix into the correct form.
  But Caffe2's FC operator *implicitly* does a transpose on
  the weight matrix.

This commit fixes both of these problems.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
b5833551f3 Documentation, and inplace support.
This adds the PyTorch API user documentation for Toffee.
To make the example work, I also converted all "inplace"
ops to export out-of-place in Toffee.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
e1b3321f92 remove singluar kernel, stride, pad. they are being removed from ToffeeIR 2017-09-05 17:48:55 -04:00
434317b155 PR comments.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
57eb8bd288 Frontend refactor, and some documentation.
- BC BREAKING: export now also takes a mandatory file-ish argument, specifying
  the file to export the protobuf to.  I rewrote the tests to use BytesIO to
  get out the string so they could parse it again.

- BC BREAKING: export no longer returns the tensors that were computed.  To
  get these, use the internal _export function.

- Multiple inputs to models are now supported by passing a tuple to input.
  (Old API of a single Variable still works.)

- Keyword arguments to models are now supported via kwargs keyword arg.

- Renamed embed_params to export_params, and it now defaults to True.

- Toffee tests now live in their own test_toffee.py file.  I had to
  rename a pile of expect files for this.

- Removed defunct torch.toffee imports from autograd to solve module import
  cycle.

- Helper function _with_file_like to abstract over opening file-ish arguments,
  taken from torch.save()

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
6ae77b32b9 Delete dead torch.toffee.op
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
61a922e183 data_other_types now has correct type.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
1f77d482d5 Don't insert Transpose if it is no-op.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
e29655f46d Run JIT tests earlier
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
215b980f06 More torch.jit docs.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
4174112b49 Add lint pass for handle invariant.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
cd8d41c0f9 regen toffee.proto for nanopb, enum of types has dropped double 2017-09-05 17:48:55 -04:00
3ef2ec6153 Actually correctly handle non-float exports.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
72843d5186 ATen hotfix: elementSizeInBytes for types
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
805c35a519 Model updates.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
daa3f7324c Track ToffeeIR inplace changes.
Rather than reuse input as output names in ToffeeIR, mark places where
inputs are consumed. In C2 conversion these annotations will be used
to create the corresponding graph.

Toffee submodule update.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
c537aebf5a Always run DCE in Traceable 2017-09-05 17:48:55 -04:00
9f0c4c9f9a Make autograd engine reentrant without creating new threads 2017-09-05 17:48:55 -04:00
e05979c4ea adding dummy bias for the conv transpose 2017-09-05 17:48:55 -04:00
ff77906e44 Refactor the user facing e2e test API - hide trace 2017-09-05 17:48:55 -04:00
4f6a7f4e2e support more types in export 2017-09-05 17:48:55 -04:00
31eda1230c support exporting constants 2017-09-05 17:48:55 -04:00
161e21f68d Missing batchnorm fix
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
292ec9d75b Remove NDEBUG macro.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
b2e7438ead Move disallow_copy into utils.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
d59714e3b1 Code review comment changes.
- Reduce setup.py diff.
- Expunge WITH_TOFFEE from codebase.
- Elaborate on a comment.
- Move gen_toffee.sh to tools
- Delete densenet test.
- Use 'using' to inherit a constructor.
- Delete outdated comment.
- Comment about why primspecs can return fewer outputs.
- Remove dead, commented out includes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
7ac6d67a4e Add nanopb to list of dep_libs.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
77ede8fc1c .travis.yml cleanup
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
1e0171f436 Super resolution network (#148) 2017-09-05 17:48:55 -04:00
2e266837f5 Port TracingState to pybind11, new export() method.
Along the way I added converters for Variable and TracingInput.  Variable should
probably be moved to a more widely known spot.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
8f1168d355 Test updates for new version
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
2bc3881fe2 Put version in protobuf we produce.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
50b5f4d219 Minor comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
3b478c17a0 JIT backward closure comments / Render stage changes in inputs.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
f83c4fad7b Fix exception propagation from recursive Engine calls 2017-09-05 17:48:55 -04:00
d8e2ab632e Add support for Constant nodes in AutogradClosureFactory 2017-09-05 17:48:55 -04:00
594f98ce16 Support multi-stage AutogradClosures 2017-09-05 17:48:55 -04:00
43be0a679c fmap now doesn't require template arguments 2017-09-05 17:48:55 -04:00
b33f64b2e7 Fix nanopb build 2017-09-05 17:48:55 -04:00
25287a129b Test updates for plural attributes #145
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
3b5a5a6f9c Use plural attributes in MaxPool1d, MaxPool2d and AvgPool2d 2017-09-05 17:48:55 -04:00
225e8c8acf switch to using raw_data in PB 2017-09-05 17:48:55 -04:00
6264996169 ToffeeIR CI hotfix
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
55ac596ea9 Faster tensor serialization.
Instead of dynamically allocating a float for each element of the tensor
(lol!) save the tensor itself, and directly read out the data.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
2d4da3657f Maintain invariant in env that all nodes are mapped.
"Unused" nodes are mapped to nullptr, and we distinguish
on lookup nodes which were never mapped versus nodes that
were mapped but supposed to be unused.  This case
should never happen, but a little extra safety never hurt.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
e2a84e1e65 PR comments.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
8c5eba3f3c Add an Undefined node for null arguments to tensors.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
b2e305e390 Lint after ToffeeIR, and subsequent fallout.
I realized we weren't running the linter after ToToffeeIR, so
I added a lint call.  It thus emerged that the current implementation
was using "Unused" nodes that were not added to the graph,
which was tripping the lint.  I fixed this a few ways:

- BatchNorm and Conv primspecs were returning dead "unused" nodes
  for their (implicit) handle parameters.  I removed them because
  setOutputs handles this already, and a dead unused node which
  is not attached to the graph violates the "no dead nodes"
  invariant.

- OK, but MaxPool actually needs to return a unused node for
  the output which supported by PyTorch but not Toffee; we need
  to error if subsequently in the trace this output is used.
  The new strategy is to have MaxPool's primspec return a None
  at the unused position, and then immediately *check* if there
  are any uses of that output.  If there are, that's an error!

- I needed to adjust the Select invariant in the exporter loop:
  only if a Select node has *uses* is it mandatory for it to be
  defined in env.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
685d7b83ba Batchnorm's bias is mandatory.
Unlike convolution, bias in SpatialBn is mandatory; see
https://github.com/caffe2/caffe2/blob/master/caffe2/operators/spatial_batch_norm_op.cc

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
84f8c88c24 Batchnorm fixup
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
82efbe349b Handle batchnorm properly.
Basic idea:
- Pass buffers (marked as non-Variable tensors) as input variables to
  the trace.   Every buffer gets represented as an input variable
  to the trace, and we remember a correspondence of the underlying
  TH pointer and an input variable in the trace.
- When we initially trace a function, we DO NOT record the buffers
  as edges.  This is so autograd doesn't have to know anything about buffers.
  If we ever turn buffers into requires_grad=False parameters, then
  this problem goes away.
- When we primspec the buffer, NOW we reach into the cached buffers
  (now appropriately named) and gin up the buffer information we need.

Other things:
- CppOp execution is now supported (but lightly tested) using
  SimpleEval (thanks @apaszke!)

Todo:
- E2E tests need to have their hacks removed.
- Figure out what is going on with backwards

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
218058b94a Make CppOp autograd execution work (temporary)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
63c835bbe7 Add keep_vars parameter to state_dict.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
96ae6a5e48 Don't DCE Params.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
2db7c5621f merging dcgan changes that needed to be refactored from older primspec approach 2017-09-05 17:48:55 -04:00
dc6378d891 merge fixes for Squeeze and ConvTranspose 2017-09-05 17:48:55 -04:00
a1bb403326 Ignore nanopb for lint.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
605ef38831 Explicitly override CMAKE_DEBUG_POSTFIX for nanopb build.
If it's not set, CMAKE_DEBUG_POSTFIX sets it to 'd' which means the
static library gets named something different when built in debug mode.
This is annoying because it means if you build in debug mode, the
library is in a different place.  Rather than teach the build system
to find the correct name, just set this POSTFIX so names don't change.

Also, update setup.py to look for the non-debug archive.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
0bc498ee94 Apparently, lib64 isn't created all the time.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
de6ef65be5 Port to nanopb.
General strategy:
- nanopb is statically linked into PyTorch.  It must be built
  with -fPIC.
- Generated nanopb files for toffee.proto are checked into
  our repo.
- Because nanopb generated protobufs are C only, we wrote a
  wrapper around it to give a Google C++ style interface.
  More on this shortly.

How does the wrapper work?
- It's called "micropb" becaues it is less small than nanopb :)
- nanopb requires all variable-length fields to be written out
  using a "callbacks" mechanism.
- We wrote pre-canned callbacks for all of the types ToffeeIR
  writes out and lists; these are micropb_callback and
  micropb_callback_list.  These operate simply by dynamically
  allocating and storing the data to be written out in
  data (this defeats the purpose of the callback mechanism,
  but it's easy to implement)
- Finally some boilerplate to actually implement the wrapper
  classes and have owning pointers to the actual data.

Testing strategy:
- Take the serialized protobuf from nanopb, parse it again
  with ToffeeIR and print it.  Worked with all of test_jit.py!
  These tests don't run without 'toffee' being installed.

TODO:
- Update CI to install ToffeeIR, so we can run the Toffee tests
  in CI
- Update E2E with Caffe2 tests so that they work with new stuff.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
ac8d3372b0 Add nanopb submodule.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
35bddb6b7e pr feedback 2017-09-05 17:48:55 -04:00
c9f7f2eff4 Change pipeline for exporting to toffeeIR
previously:
  PythonOp/CppOp Graph -> ToffeeIR, primspecs worked with protobufs
now:
  PythonOp/CppOp --ToToffeIR--> jit::Graph of in-memory ToffeIR -> protobufs of ToffeIR

This commit let's primspec functions work directly with JIT IR nodes,
which makes it possible to do a lot more stuff in those functions.
2017-09-05 17:48:55 -04:00
3afb4d8728 giant expect commit 2017-09-05 17:48:55 -04:00
bad5717e15 add ability to specify initial values for inputs 2017-09-05 17:48:55 -04:00
81342910d7 fix the op Tanh spelling: tests
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
d12cf7dd45 fix the op Tanh spelling 2017-09-05 17:48:55 -04:00
8c2663a685 Put every input on a new line: TestJit test updates
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
d5d65080e3 Put every input on a new line.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
384efe482a Use Toffee IR schema to disambiguate types of attributes.
Let say I write alpha=2 in my PyTorch code.  Is alpha a float
or an int?  This problem is resolved when we actually pass
it to the underlying kernel, which knows what type it expects
it as.

When serializing to Toffee IR, the Toffee NodeProto also needs
to dictate the correct type; otherwise, we may guess wrong.
We get this information from the OpSchema in the ToffeeIR library.
With this, we can avoid explicitly casting in dropout.py and
auto_primspec.py

WARNING: You will need to update torch/lib/ToffeeIR when you pull
this patch, as attribute schemas were added recently to ToffeeIR.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
f062e06c91 Make null Variables on convolution and batchnorm work.
This addresses when bias is disabled, which occurs in torchvision's
alexnet and densenet.

The general strategy is this:

- When we encounter a null variable, we turn this into a Constant
  node with an undefined at::Tensor

- Toffee exports for BatchNorm and Conv have special cases for bias,
  checking if they are provided by a Constant node with undefined
  value, and just omit the input if so.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
6039f007c4 Make assertExpected Python 2 friendly.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
6701cc7c8e flake8 excludes update 2017-09-05 17:48:55 -04:00
a60d9bd022 Bind Attributes in python ir, and add test for python ir binding 2017-09-05 17:48:55 -04:00
a3fdb281d1 Python wrapper for Node IR using pybind11
Supports almost all of the IR API.
2017-09-05 17:48:55 -04:00
6d0364f13d Add pybind11 as a submodule. 2017-09-05 17:48:55 -04:00
0a83f86348 Add Eval Handles: JIT test update
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
5823cc419a Ignore Handle when exporting to ToffeeIR.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
e14b766a81 Add a comment about Handle 2017-09-05 17:48:55 -04:00
965a349bbd Record context edges in the JIT 2017-09-05 17:48:55 -04:00
9f97291408 Make tracer thread-safe 2017-09-05 17:48:55 -04:00
8dab0237e2 Maintain Select-node invariant in DCE 2017-09-05 17:48:55 -04:00
ec9761789a Enforce deterministic ordering on Eval inputs/placeholders 2017-09-05 17:48:55 -04:00
fa308b3183 Improve backward tracing 2017-09-05 17:48:55 -04:00
91dcf2938a Miscellaneous fixes needed to make caffe2 E2E 2017-09-05 17:48:55 -04:00
6297144e51 Build hotfix.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
1517ef687e use constants for directions 2017-09-05 17:48:55 -04:00
b0ba9a81d2 remove std::list, restore custom node list implementation. 2017-09-05 17:48:55 -04:00
222e8c0591 PR fixes 2017-09-05 17:48:55 -04:00
b606106c4d thread safe interned_strings 2017-09-05 17:48:55 -04:00
14f9316d2b renaming IR_IF family 2017-09-05 17:48:55 -04:00
55cd9f37d1 remove Select, and NodeWithKind 2017-09-05 17:48:55 -04:00
4a4739e048 remove most node subtypes 2017-09-05 17:48:55 -04:00
c369a44bf1 remove chunk subclass 2017-09-05 17:48:55 -04:00
9f8a35c0b9 remove Primitive nodes. 2017-09-05 17:48:55 -04:00
24cdb897d6 starting removing nodes by removing Return 2017-09-05 17:48:55 -04:00
b037efa92c prep for removing node subtypes 2017-09-05 17:48:55 -04:00
57b7370aab switch NodeKind over to Symbol type. 2017-09-05 17:48:55 -04:00
1fa5b19ba4 Attributes object that mirrors Toffee, and interned string table, used by attributes for keys. 2017-09-05 17:48:55 -04:00
3c5dced6ce Make batch-norm work end-to-end with caffe2
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Blame Revision:
2017-09-05 17:48:55 -04:00
3c6fbcabea encode size in name... 2017-09-05 17:48:55 -04:00
d596bad1b9 remove attribute in expect 2017-09-05 17:48:55 -04:00
d7d74428a3 batchnorm hacking 2017-09-05 17:48:55 -04:00
150fd2848d batch norm primspec stub 2017-09-05 17:48:55 -04:00
af90a780d1 primspec for avgpool + squeeze (#80) 2017-09-05 17:48:55 -04:00
0ca3ca302e test for primspec for concat (#77) 2017-09-05 17:48:55 -04:00
52e0816bed primspec for concat 2017-09-05 17:48:55 -04:00
72a7530023 premspec for leaky_relu (#70) 2017-09-05 17:48:55 -04:00
0e5320e073 Lint
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
6405391065 Small comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
db79be82ab Move Toffee for C++ functions back to autograd.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
f265ff1dca Bugfix where it was always input 0
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
bee0e45355 Don't create empty attributes.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
c0d0a99977 Alexnet back online.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
ee2ba279f2 Working Reshape op
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
35f1cb462d Invert negation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
e1b345d81b More alexnet things as primspec.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
1f4bebe27a Build fixes when Toffee is enabled.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
6f6fe177f1 Make Toffee optional. Unbreaks CI.
The general strategy:

- We put all the toffee files in torch/csrc/toffee; they will only be
  added when toffee is enabled

- Toffee is enabled if torch/lib/ToffeeIR is present (since we
  don't have a submodule/subtree thing going on)

- The most prevalant place you will need to use WITH_TOFFEE is for
  primspec definitions on C++ autograd functions.  There is a
  macro HAS_PRIMSPEC to ameliorate optionally defining primspec()
  virtual overrides on Function classes.  HasPrimspec is always
  available but will be a zero field class when Toffee is disabled.

NB: We might revert this commit in the future if we figure out a way
to unconditionally enable Toffee that everyone likes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
05a6d4c137 Create a C++ primspec virtual method.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
4b1f182199 Disable C++ Python conversion code.
We want all the conversion code to live in one place. Away it goes!

This means that alexnet protobuf no longer works.  It will start working
again when we port changes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
dd58b145c3 Toffee graph exporting for PyTorch.
This commit adds a new exporter pass which takes a graph and returns
a string of the human-readable protobuf representation of a model.

We have two strategies for how conversions are implemented:

- If a Python autograd function has a primspec static method, we invoke
  it to get the Toffee conversion.  Use torch.toffee.op to generate the
  format expected to be returned.  The particular data representation is opaque
  and subject to change in the future.

- Otherwise, there's a giant if statement in the exporter, which manually
  uses the JIT IR C++ API and Toffee IR C++ protobuf API to convert.

You must check out a copy of the ToffeeIR repo
https://github.com/ProjectToffee/ToffeeIR at torch/lib; at the moment
we don't have a subtree/submodule set up.

Technical debt in this commit:

- To get protobuf headers in scope, we unconditionally add $CONDA_PREFIX/include
  to the include path.  This needs to be replaced with a more robust mechanism.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
890c2071f0 PR comments 2017-09-05 17:48:55 -04:00
35f1ca1293 Make autograd engine reentrant (#37) 2017-09-05 17:48:55 -04:00
c8b303e853 guard dump, guard cuda 2017-09-05 17:48:55 -04:00
f4b7178b59 track scalar type 2017-09-05 17:48:55 -04:00
b6175eb54d enable fusion group execution in autograd closure. implement chunk. propagate type information through fusion optimization. 2017-09-05 17:48:55 -04:00
62efac4ba5 make Type into a immutable object and share them rather than clone.
allow nodes to have undefined types, which reflects reality right now
where some TensorType nodes are just not filled in.
2017-09-05 17:48:55 -04:00
bcf5c11e10 cuda guards 2017-09-05 17:48:55 -04:00
e91966a0b4 Unify our tracing API into a single interface for functions/models.
The API works on either functions or models, taking an extra parameter argument
so that functions can pass in additional variables to trace.

Other behavior is folded into boolean options:

time - collect stats for our own perf debugging
verify - run the original code, and check it is within threshold
optimize - run optimization (currently off until fusiongroups pr is accepted).
enabled - flag to turn off tracing so you can check timing of stuff that cannot be traced.
2017-09-05 17:48:55 -04:00
510529ecd0 missing expect 2017-09-05 17:48:55 -04:00
9431742d5a Build error fix 2017-09-05 17:48:55 -04:00
7f60a18293 Add initial support for backward tracing 2017-09-05 17:48:55 -04:00
5b6bcf1ce4 Warning squishing.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
29ddcbfe17 Rename TypeKinds to suffix Type, matching class names.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
accd52feef Print types, and improvements to type APIs.
Fixes #48.

I had to shave some yaks:

- I needed switch on Type, so I wrote a new macro set TYPE_IF,
  and abstracted the IR_IF into a GENERIC_IF.  The parametrization
  is on const-ness and the type kind; also there is a minor annoyance
  where type kinds (ugh, hate the name; it means the wrong thing
  in Haskell land) don't match the class names, so there needs some
  suffix munging.  There's still some extra funny business, see
  https://github.com/ezyang/pytorch/issues/51

- A lot of functions on types weren't declared const when they could
  have been.  I added const qualifiers as necessary.

- setType now takes an honest to goodness Type* rather than TypeKind.

- init_pass now preserves types when it does transformations.

There are still some places we're losing types, most notably fusion.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
eb730f8321 Inplace test.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
4a1bbc01ac Fix #41.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
765b0bf137 Make in-place work again.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
32c5be4c31 Lint record_trace.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
b6a8eaa6ed Give ConvForward an explicit name.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
21c0ad9702 Test case that we fail legacy traces
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
453b0fac03 Always print diffs, no matter how large.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
624e451d6b Add comments 2017-09-05 17:48:55 -04:00
1c4538e017 Trace C functions 2017-09-05 17:48:55 -04:00
bdcbbeaf68 Remove GlobalTracingState 2017-09-05 17:48:55 -04:00
ba2b2bcdc1 Change calling convention for C++ autograd functions 2017-09-05 17:48:55 -04:00
82ed7c0232 POC: add Handles to represent opaque state passed between Nodes 2017-09-05 17:48:55 -04:00
09b35506f4 rename init_pass.cpp 2017-09-05 17:48:55 -04:00
a136c30309 add comments, rename function 2017-09-05 17:48:55 -04:00
9fd06b2051 add a rule to distribute chunk operators when it stops fusions. 2017-09-05 17:48:55 -04:00
a096959ab8 make multi-output uses/defs easier to ready in pretty print. 2017-09-05 17:48:55 -04:00
0d3421ac01 Handle Constant lint.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
b158aaf6b4 Make linter an optimization pass.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
cf46ef05db Finish the rest of the lint pass.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
3016f459d2 Partial lint pass.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
76c7788e81 Remove THPP imports 2017-09-05 17:48:55 -04:00
dad625b54a Comment for WrapConstant/ConstantFactory, remove thpp import 2017-09-05 17:48:55 -04:00
f0902027ce Typofix 2017-09-05 17:48:55 -04:00
2dc3ef73ae Lint 2017-09-05 17:48:55 -04:00
c931feaad0 Elaborate on NB a little 2017-09-05 17:48:55 -04:00
3e0f1608fe Capture Variables that are not inputs as constants 2017-09-05 17:48:55 -04:00
af21c6b018 Add Node type to JIT IR
Rewrite Type as a class hierarchy

PR comments + rebase fixes
2017-09-05 17:48:55 -04:00
348950dc74 cleanup jit_test 2017-09-05 17:48:55 -04:00
1f900861b6 remove _NOCAST, use fully-qualified name in macros 2017-09-05 17:48:55 -04:00
233a66dcbe Remove SimpleMap from JIT IR 2017-09-05 17:48:55 -04:00
f5e414862a cuda guards for fusion compiler 2017-09-05 17:48:55 -04:00
ea4aaa6b0b Document TemplateEnv & PR fixes 2017-09-05 17:48:55 -04:00
50e51eaa7f Fusion of simple map operations using nvrtc.
Approach is based on the approach of THC's pointwiseApply{1,2,3} family of kernels,
but doesn't have any dependencies on that code.

Adjacent contiguous dimensions of input tensors are compressed to reduce the complexity of indexing math.
For the completely contiguous case, the indexing logic simplifies to just the linear index.

In simple tests, this code matched or beat the equivalent from THC.
2017-09-05 17:48:55 -04:00
51a1618683 Remove Return node from nodes() 2017-09-05 17:48:55 -04:00
a4086508c6 Enable tests 2017-09-05 17:48:55 -04:00
f270973937 Add JIT IR -> Autograd IR converter 2017-09-05 17:48:55 -04:00
e186d16e6b Apply JIT optimizations form Python 2017-09-05 17:48:55 -04:00
72659bcdef Minor code cleanup 2017-09-05 17:48:55 -04:00
57d65a99bb Add LSTM fusion test.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
8f12bc5a4c Temporarily print Return nodes, pending printer fix.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
8f3a01932b Swap order of assertMultiLineEqual.
This makes the diff look more intuitive.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
a5b87de139 Squash warnings.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
9662cffd26 Use std::list in JIT IR 2017-09-05 17:48:55 -04:00
e238f3cada Very simple accept/golden test framework for JIT trees.
- To test whether or not a multiline string matches some expected
  value, you can use assertExpected.  This tests that the string
  matches the content stored at a file based on the name of the
  test (and an optional subname parameter you can pass if you
  what to assertExpected multiple times.)

- Suppose you make a change that modifies the output in a big way.
  Instead of manually going through and updating each test, you instead
  run python test/test_jit.py --accept.  This updates all of the expected
  outputs.  You can now review them one-by-one and make sure your
  changes make sense.

We can add more features later (e.g., munging the output to make it
more stable, more sanity checking) but this is just to get us started
testing.  One thing to watch out for is that accept tests on intermediate
representation can be a bit wobbly: it is *extremely* important that
people be able to read the IR.  It may be worth introducing niceties
to the printer in order to ensure this is the case.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
cb53882c5e Make warnings clean.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
ac5dd887dc python clone, more asserts, better names. 2017-09-05 17:48:55 -04:00
da6122bd35 Document all public graph manipulation functions 2017-09-05 17:48:55 -04:00
e477e56519 Add some preconditions to the comments I added. 2017-09-05 17:48:55 -04:00
3182d732ee Some documentation for mutator methods. 2017-09-05 17:48:55 -04:00
a89c49d723 Minor fixes to comments 2017-09-05 17:48:55 -04:00
d959bf43c3 add comments explaining IR and fuser 2017-09-05 17:48:55 -04:00
fde064088f Add logic for fusion. Add clone mechanism to IR, with init() methods to setup nodes. 2017-09-05 17:48:55 -04:00
538cc89dbc print uses in output 2017-09-05 17:48:55 -04:00
48945a435d IR modifications to make mutatation possible. Nodes are in intrusive doubly-linked list. Methods added to manipulate inputs etc. 2017-09-05 17:48:55 -04:00
a2c140f985 Refactor owning Graph pointer initialization.
Now it gets initialized during the constructor.  This results
in more boilerplate but is conceptually more correct, and solves
an assert failure.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
49bb223786 Break when an assert fails.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
8215860d2f Add an assert wrapper for easy porting.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
3dcbba1f35 Keep Variable mapping as part of TracingState 2017-09-05 17:48:55 -04:00
55c9e0258e Make the linter happy 2017-09-05 17:48:55 -04:00
6be47ec907 Minor fixes and improvements 2017-09-05 17:48:55 -04:00
2ced918063 Add a very simple visual (non-automated) test.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
ea05ac8f41 Move JIT-related files to jit dir. Remove IR interpreter 2017-09-05 17:48:55 -04:00
1325fa511c JIT IR including use-def chains and updated comments. 2017-09-05 17:48:55 -04:00
7c083b00f8 refcounting for Node/Value 2017-09-05 17:48:55 -04:00
f369f8e80d simplify IR 2017-09-05 17:48:55 -04:00
4979359800 Add graphs, trace them.
It is not an /expression/ we trace, but it is a /graph/: that is,
a closed expression which knows its parameters.  Knowing the list
of parameters is helpful and helps remove a hack when interpreting.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
a2cc7a00e6 Fix Python 3 build problem
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
c1dec0663f New stratification: add Operator/Instruction
This prevents nested lets, which are not allowed in ANF.  We
basically have SSA now.

There's some niftiness with the visitor returning a lambda which
then gets fed the actual argument. I like it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
60751cd889 Add verify_model to torch.jit, for sanity checking.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
7bd4c5a27c Minor sanity check.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
3055b69f63 Refactor Arg class away.
Although ANF style developments traditionally stratifies syntactic
classes into atomic (Arg) and complex (Expr) expressions, where
atomic expressions could be variables, constants or lambdas, Zach has
successfully convinced me that we should do away with the variant here and
always require arguments to be variables.  There are a few reasons for
this:

1) Tensor constants, not currently supported, could be modeled using a
"Constant" instruction, removing the need for them to be representable
directly inline.  An inline constant is marginally more convenient
for peephole optimizations, but since we have gone full ANF, we are going
to need to be able to see across def-uses in any case, and it is not
too much worse to need to handle constants this way.  By the way,
Swift Intermediate Language also made a similar choice, see
the slide on "Literal Instructions" in
http://llvm.org/devmtg/2015-10/slides/GroffLattner-SILHighLevelIR.pdf

2) Scalar constants, which are quite important for passing non-tensor
arguments to Python operators, are now stored out-of-band as NON
first-class values.  This more closely matches the ToffeeIR design,
and makes it clear what parameters are "first class" (tensors only)
and which ones are not.  However, we need to be able to unswizzle
the separate scalar/tensor lists into a unified list in the correct
format; this is what PyFunctionCConv is for.

Also, Locals got renamed into Tuple.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
13663c1ee7 Fix clang build error, struct/class agreement.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
25bf7639f4 Remove incorrect clear from THPExpr/Arg_dealloc
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
8ab905b769 Remove unused output_list.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
c466b2c1f6 Make an error message better
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
c4ccae6a89 Document move semantics on PyObject with THPObjectPtr&& constructor.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
0fc17adf71 Add simple JIT frontend.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
f9458a3720 Add comments from discussion with Zach.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
d35ae86f26 Don't use misleading Ret nomenclature.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
a797ab9343 Rewrite AST to a new, more functional representation.
Previously, our AST was a DAG, where shared Nodes indicated a computation
should be reused.  This commit rewrites the IR into a new functional
representation which represents sharing explicitly using variable
bindings.

We offer a few justifications for this new style:

1. The new representation is not all that different from the
old one; it is about as easy to construct, and the lack of an
explicit graph doesn't negatively impact our ability to interpret
the graph, since we've chosen, as a matter of design, to NOT have
the IR participate in the actual execution of a graph.

2. The new let-binding representation has an implicit ordering,
which we can use to conveniently keep track of the original order
the trace showed up as.  This automatically gives us a topsort,
and gives us an easier to read textual representation of our
IR:

  %14 = Embedding %11, %0, -1, None, 2, False, False
  %15 = Dropout %14, 0.2, True, False
  %16 = Index %12, 0
  %17 = Index %12, 1
  %18 = Index %13, 0
  %19 = Index %13, 1
  %20 = Index %15, 0
  %21 = Linear %20, %1, %3
  %22 = Linear %16, %2, %4

3. It moves us closer to a Futhark style language
(http://futhark-lang.org/publications/pldi17.pdf).

Major aspects of the diff

- Node is replaced with Expr and Arg, a pair of mutually recursive
  structures which represent our new language.  In BNF, the language
  looks like this:

    a ::= c | %i
    e ::= %i, ... = e
        | PyOp e, ...
        | Ret %i, ...

  Technically, Ret is not actually a return (no control flow is involved),
  it just tuples up a series of tensors (identified by variables).

  One important invariant is that locals are always tensors; they
  are never constants (this is asymmetric with Args.)

- Arguments support Python constants.  This is an important piece because
  many operators take extra Python literals like integers and tuples in
  order to specify extra parameters about how an operator operates.  Adding
  this was essential to getting word_language_model to work.

- As both Expr and Arg have multiple variants, there is new infrastructure
  for doing case on the variants using ExprVisitor and ArgVisitor.  The
  strategy here is adapted from WebAssembly's visitors, although we have
  generalized to permit arbitrary argument forwarding, which is necessary
  to support tail-recursive visitor calls.  TCO is important because our
  interpreter may recurse arbitrarily deep into a stack of nested lets.
  If users wish, they can also manually case on the type tag.

- Tracing is now turned on and off using _tracer_enter/_tracer_exit in
  torch._C.  _tracer_enter accepts a list of variables which are to be
  treated as arguments; _tracer_exit accepts the list of traced variables
  which should be returned when you reexecute the trace, and returns
  the trace expression which can be reexecuted.  GlobalTracingState
  is a global variable which tracks whether or not we are tracing or not.

- You use run_forward to execute a trace on some set of parameters.

- When under tracing, variables keep track, via trace_local, what the
  name of their variables in the IR are.

Here is a simple runner which leaks memory but can be used to JIT models:

  import torch.autograd.function as F
  import torch._C

  def jit(model):
      import types
      real_forward = model.forward
      def forward(self, *args):
          def flatten(x):
              return tuple(F._iter_variables(x))
          if not hasattr(self, "saved_trace"):
              torch._C._tracer_enter(tuple(self.parameters()) + flatten(args))
              out = real_forward(*args)
              self.saved_trace = torch._C._tracer_exit(flatten(out))
              self.saved_outs = out
              return out
          else:
              flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args))
              return F._unflatten(flat_out, self.saved_outs)

Major problems:

- Sanity checking is spotty at best, especially when users pass in variables.

- The interpreter leaks tensor memory from the store.  When we add back def-use
  we should be able to deallocate tensors as soon as we know they are no longer
  necessary.

- The interpreter needs to reach feature parity with the old execution engine.
  From there, we need to see if backwards can be subsumed as well.

- I still have no confidence in having memory managed everything correctly.
  This requires a close look.

- Rather than return an *open* expression as a trace, we should return a
  *lambda* instead, which knows about how many formal parameters it
  requires.

- The IR is not introspectable from Python at the moment, but this is simply a
  matter of implementing all the binding code.

- The tracer is NOT reentrant (you can't trace while you're inside a trace.)
  Furthermore, no sanity checking is done if you try to incorrectly reuse
  things from one trace in another.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
6f9774d7db Minor bugfix.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
11107190ca Handle legacy correctly.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
1e8bf12b3a Add an inefficient but working evaluator for forward traces.
Simple test:

  import torch
  from torch.autograd import Variable
  import torch._C as _C

  x = Variable(torch.Tensor([4]), requires_grad=True)
  y = Variable(torch.Tensor([7]), requires_grad=True)
  z = x * y
  z.sum().backward()

  print(x.grad)
  print(y.grad)

  x.data[0] = 2
  y.data[0] = 3

  (z,) = z._execution_engine.run_forward((x, y), (z,))
  z.sum().backward()

  print(x.grad)
  print(y.grad)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
50b375d9bf Add input nodes to the IR representation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
e1b7872fc2 Make it possible to access IR from Python.
Also, add a new trace_fn field to attach forward IR to Variables.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
c5faaf69d8 Initial IR representation for forward trace.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
b1b20e4097 Remove dead field from UnpackedInput
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
3d7459ff6c fix indices for data_parallel and add parameter gradient tests (#2632) 2017-09-05 17:29:27 -04:00
77a02eaa7f Enable reader checkpoint
Summary:
Reader checkpointing was disabled due to bug captured in T21143272
Now that we have resolved that issue, re-enabling reader checkpointing

Reviewed By: boryiingsu, rayleichen

Differential Revision: D5730545

fbshipit-source-id: 7fae48b03e07eaf530bfc9e8e8b6683d8ed4e206
2017-09-05 14:21:25 -07:00
81ddd5e869 Use std::{thread,mutex,condition_variable} instead of raw pthreads in WorkersPool
Reviewed By: Yangqing

Differential Revision: D5753072

fbshipit-source-id: 436915e6253eb517306c577e31854f8e018a36dc
2017-09-05 12:33:13 -07:00
3ff351fc89 insert Free ops when blob used last time + memory allocation estimator
Summary:
release_blobs_when_used() will analyze when a blob is output the last time, and insert a Free op after that. Unless the blob was aliased.
memonger.estimate_memory_usage() does a static memory analysis based on shape inference. See experimental/akyrola/test.py for example use.

Reviewed By: asaadaldien

Differential Revision: D5729199

fbshipit-source-id: 527a5152dbd4ef3bbe28b776c29163fff25f700a
2017-09-05 12:03:04 -07:00
f2c9aea75f Merge commit 'a3ddf9e18003f13a1094c7c5d62905f4db102da3' 2017-09-05 14:46:23 -04:00
a3ddf9e180 fix pointer arithmetic for large input/output sizes 2017-09-05 11:44:16 -07:00
1104bab796 add axis argument to NormalizeOp and NormalizeGradientOp
Summary:
As described in task T21337239, NormalizeOp currently normalizes over only the last dimension.
In this commit, the following changes have been made:
(1) Added an axis-parameter to NormalizeOp in both the CPU and CUDA context.
(2) Added the same axis parameter to  NormalizeGradient in both the CPU and CUDA context
(3) Removed the limit that the original NormalizeOp operator requires the input dimension to be 2

Reviewed By: akyrola

Differential Revision: D5745162

fbshipit-source-id: 69e04f59ac4d954b0062c3b2a53c8ca465a1027b
2017-09-05 11:17:32 -07:00
90a4d91469 Remove scalar expansion tests. 2017-09-05 10:59:40 -07:00
25dd9ba799 Address review comments. 2017-09-05 10:34:04 -07:00
7ec4485858 move cmake_uninstall.cmake.in into cmake/ subfolder
Summary:
TSIA
Closes https://github.com/caffe2/caffe2/pull/1167

Differential Revision: D5767229

Pulled By: Yangqing

fbshipit-source-id: 0798981e505ffe11f532065680f794cba16d140c
2017-09-05 10:02:18 -07:00
42448cf07f Fix to make the sample code executable as-is in "Extending PyTorch" (#2621) 2017-09-05 10:19:49 -04:00
1b013c0b52 fixed issue #2613 in torch/legacy/nn (#2624) 2017-09-05 10:13:56 -04:00
bfbd1bbb50 Update torch.triu/torch.tril doc (#2619) 2017-09-05 00:05:44 -04:00
dd5400e452 Make android segmentation network run on both iOS and android with tiling
Summary: Add tiling support to GLAdd, GLPool, and GLResizeNearest

Differential Revision: D5733208

fbshipit-source-id: b73113326b96d421787d4695ccf7d2d919ee2ed8
2017-09-04 17:33:31 -07:00
8430cf6e86 Merge commit 'db78f3cf468549b206c3c8bdc9fb42df86ded2a7' 2017-09-04 11:13:59 -04:00
db78f3cf46 fix bug for THTensor data access 2017-09-04 11:12:52 -04:00
40ca356d36 make logsoftmax documenatation readable (#2606) 2017-09-04 00:23:26 -04:00
7fa7a101af Fix emmbedding doc formatting (#2605) 2017-09-03 11:27:11 -04:00
bf013f4c99 fix Python 2 gloo install (#2597) 2017-09-02 20:05:37 -04:00
f0f7b39650 fix example in docs for nn.init.calculate_gain (#2600) 2017-09-02 19:23:25 -04:00
2d9728d594 Add more enforces to SparseToDenseMask operator.
Summary:
It looks like this operator is missing some enforces that it should have (since
it's working on the user inputs). This diff is added enforces to ids to be in a
valid range.

Reviewed By: dzhulgakov

Differential Revision: D5488336

fbshipit-source-id: e045c3b71b92e443edd23c95aa75d144877f1334
2017-09-02 02:16:24 -07:00
c858c68537 cmake: stop including files from the install directory
Summary:
Here is the buggy behavior which this change fixes:

* On the first configure with CMake, a system-wide benchmark installation is not found, so we use the version in `third_party/` ([see here](https://github.com/caffe2/caffe2/blob/v0.8.1/cmake/Dependencies.cmake#L98-L100))
* On installation, the benchmark sub-project installs its headers to `CMAKE_INSTALL_PREFIX` ([see here](https://github.com/google/benchmark/blob/4bf28e611b/src/CMakeLists.txt#L41-L44))
* On a rebuild, CMake searches the system again for a benchmark installation (see https://github.com/caffe2/caffe2/issues/916 for details on why the first search is not cached)
* CMake includes `CMAKE_INSTALL_PREFIX` when searching the system ([docs](https://cmake.org/cmake/help/v3.0/variable/CMAKE_SYSTEM_PREFIX_PATH.html))
* Voila, a "system" installation of benchmark is found at `CMAKE_INSTALL_PREFIX`
* On a rebuild, `-isystem $CMAKE_INSTALL_PREFIX/include` is added to every build target ([see here](https://github.com/caffe2/caffe2/blob/v0.8.1/cmake/Dependencies.cmake#L97)). e.g:

      cd /caffe2/build/caffe2/binaries && ccache /usr/bin/c++    -I/caffe2/build -isystem /caffe2/third_party/googletest/googletest/include -isystem /caffe2/install/include -isystem /usr/include/opencv -isystem /caffe2/third_party/eigen -isystem /usr/include/python2.7 -isystem /usr/lib/python2.7/dist-packages/numpy/core/include -isystem /caffe2/third_party/pybind11/include -isystem /usr/local/cuda/include -isystem /caffe2/third_party/cub -I/caffe2 -I/caffe2/build_host_protoc/include  -fopenmp -std=c++11 -O2 -fPIC -Wno-narrowing -O3 -DNDEBUG   -o CMakeFiles/split_db.dir/split_db.cc.o -c /caffe2/caffe2/binaries/split_db.cc

This causes two issues:
1. Since the headers and libraries at `CMAKE_INSTALL_PREFIX` have a later timestamp than the built files, an unnecessary rebuild is triggered
2. Out-dated headers from the install directory are used during compilation, which can lead to strange build errors (which can usually be fixed by `rm -rf`'ing the install directory)

Possible solutions:
* Stop searching the system for an install of benchmark, and always use the version in `third_party/`
* Cache the initial result of the system-wide search for benchmark, so we don't accidentally pick up the installed version later
* Hack CMake to stop looking for headers and libraries in the installation directory

This PR is an implementation of the first solution. Feel free to close this and fix the issue in another way if you like.
Closes https://github.com/caffe2/caffe2/pull/1112

Differential Revision: D5761750

Pulled By: Yangqing

fbshipit-source-id: 2240088994ffafdb6eedb3626d898b505a4ba564
2017-09-01 23:33:14 -07:00
e368740612 Update the speed benchmark code
Summary:
(for TIR demo cases)
Closes https://github.com/caffe2/caffe2/pull/1160

Differential Revision: D5761679

Pulled By: Yangqing

fbshipit-source-id: 53b6c7fd098a394eba51baeac1e70371bcddf360
2017-09-01 23:16:39 -07:00
c9238671ee Use char-ngram embedding for out-of-vocabulary words
Summary:
**Description**

Provide DeepText model with the functionality to load a secondary index (pre-trained char-ngram embedding, e.g. FastText) during training/test.  Embeddings of out-of-vocabulary words will be computed on-the-fly during training/test by averaging the char-ngram embeddings.

**Approach**

This diff provides two custom operators to accomplish this task – ConditionalOp and IndexCharNgramGetOp.  We first use IndexCharNgramGetOp to perform char-ngram index lookup and return a sparse tensor segmented by lengths for each token.  The sparse tensor is then used to compute the average embedding provided by the char-ngram index.  Finally, we use a ConditionalOp to replace those whose embeddings were not found in the original index during the feature apply stage.  Please refer to documentations of the code for more details.

Reviewed By: jamesr66a

Differential Revision: D5666924

fbshipit-source-id: f76605d093154a014d5b9ebf9510de9d79874eee
2017-09-01 19:16:49 -07:00
0c324ba417 set stream for cudnn handle correctly in cudnn wapper
Summary:
CuDNNWRapper inline_cudnn_handle() should set the stream every time, since it can change. This caused problems in RNN scenarious. Also this bug rendered singlethread_async_net incorrect / slow!

I found out the problem by using nvprof --print-gpu-trace and noticing that some kernels were run in different stream than i expected.

Reviewed By: ajtulloch, Yangqing

Differential Revision: D5758426

fbshipit-source-id: 651c62fe28eaf09e1675d4adf3f1fac8b4c8e75b
2017-09-01 18:07:07 -07:00
ca3f2f9e6a Small fix to exporter to accept net/NetDef both
Reviewed By: bwasti

Differential Revision: D5753261

fbshipit-source-id: 55b9252606023648ee3b2acdcbbe89bcc8b54748
2017-09-01 13:32:12 -07:00
1f1aca6e09 Support broadcast specifications from cwrap.
This respects all the broadcast cwrap specifications except for 'fallback';
i.e. pointwise functions operating on tensors where the number of elements
match but the sizes are different and not broadcastable.  This behavior is
currently deprecated in PyTorch.  Note that this is a breaking change in ATen,
because ATen just passes through to TH/THC, where the fallback behavior is
actually implemented.

This also changes expand semantics wrt Scalars (as tensors).  Previously,
one could 'expand' a 1-dimensional tensor with size 1 to a 'scalar' (i.e.
empty size initializer list).
2017-09-01 12:11:04 -07:00
a1992e81b3 Replaced std::copysign(x) with (x > 0 ? 1 : -1)
Summary:
Replaced std::copysign(x) with (x > 0 ? 1 : -1).
std::copysign is not available on some Android platforms which was detected in GitHub's Travis tests:
"/home/travis/build/caffe2/caffe2/caffe2/sgd/yellowfin_op.cc:57:23: error: 'copysign' is not a member of 'std'"

Reviewed By: akyrola

Differential Revision: D5756384

fbshipit-source-id: 56bc220d2c6216ff45b9cc47ed02aebf6ad439a5
2017-09-01 11:52:44 -07:00
579fc7e959 unify bernoulli yaml declarations across backends (#2578) 2017-09-01 14:28:42 -04:00
8820d467d6 handle useless ellipsis in advanced indexing (#2589) 2017-09-01 14:27:47 -04:00
c5a8a59116 raise KeyError if registering buffer/param when attr exists (#2108) 2017-09-01 14:08:49 -04:00
925cfc0d90 Disabling test for YellowFin
Summary: Disabling test for YellowFin that does not pass test in Travis. Difference comes from numerical reasons. Test passes on my cpu / math libraries. Decide whether to merge it.

Reviewed By: Yangqing

Differential Revision: D5754144

fbshipit-source-id: b6ed6628f962d6904a8d522f0cf4080d7878acad
2017-09-01 10:35:48 -07:00
bb08f261f1 EnsureDense/SparseToDense for CUDA
Summary: Make CUDA version of SparseToDense, register EnsureDense (which is trivial) on CUDA. Need to use atomics because indices can be duplicated. We can later add an option to inform if the indices are unique, and use faster path then.

Reviewed By: jhcross

Differential Revision: D5750893

fbshipit-source-id: 005d1675b127a571aac8474fca62d9633f0c7bff
2017-09-01 09:33:05 -07:00
b2bd9ef15a protoc: only disable in watch os mode
Summary:
(see comment)
Closes https://github.com/caffe2/caffe2/pull/1157

Reviewed By: bddppq

Differential Revision: D5753813

Pulled By: Yangqing

fbshipit-source-id: 7a60f03bb37314161e42ac0405a4b168d2541f3f
2017-09-01 00:46:51 -07:00
3ae810753e fix travis build
Summary: Closes https://github.com/caffe2/caffe2/pull/1150

Reviewed By: Yangqing

Differential Revision: D5753901

Pulled By: jerryzh168

fbshipit-source-id: f4fd9259207bba4e602abee0b194a5557f57fa77
2017-08-31 23:35:20 -07:00
9f685e4aa3 Ensure GIL is held in ObjectPtrAllocators (#2581) 2017-09-01 00:30:09 -04:00
26cdfcd9cf allow single non-tuple sequence to trigger advanced indexing (#2323) 2017-09-01 00:28:45 -04:00
53ccbd9a6e soft-coverage attention
Summary:
Implementation of a new variant of attention module, which contains a recurrent decoder state with vectors corresponding to each source-side word and strictly increasing values, thus enabling it to model the degree to which source words have been translated.

The approach is a variant of the approaches described in https://arxiv.org/pdf/1601.04811.pdf. We simply include the sum of all previous attention weights for encoder words as a new recurrent state (coverage_t). A new linear transform on encoder_outputs is used to produce coverage_weights, which has the same dimensionality as encoder_outputs, and implicitly models the fertility of source-side words (and putting this extra information strain on the encoder network).

Thus the encoder output, the decoder state, and the coverage weights have the same dimensionality for a given source word, and attention logits are calculated as v *  tanh(coverage * coverage_weights + encoder_output + decoder_state).

Note: the entire coverage state for each translation instance is of shape (encoder_length, coverage_units), but the states for the RecurrentNetwork operator, used to train the decoder, must be flat in the data dimension. This state is therefore initialized with shape (encoder_length * coverage_units) [not shown in the open-source library] and reshaped appropriately within the apply_soft_coverage_attention() function.

Differential Revision: D5593617

fbshipit-source-id: 7d0522b5eb0b26f22e8429e4461a459f2f16ed46
2017-08-31 21:21:54 -07:00
0e99e7bd99 Accidental addition of a file
Summary: hotfix

Reviewed By: Yangqing

Differential Revision: D5753187

fbshipit-source-id: 472c36a6b69cfb4ffb279e525a5eb43133828f1b
2017-08-31 20:17:12 -07:00
0b643fce09 Use std::atomic instead of volatile and custom barriers in WorkerPool
Reviewed By: Maratyszcza

Differential Revision: D5745673

fbshipit-source-id: fcaabe941847e58624c8e87d27ccc607dc74e27f
2017-08-31 18:47:42 -07:00
50b5c76ea9 A benchmark generator for individual ops
Summary: basic little op benchmark generator -- outputs init_net.pb and predict_net.pb for use with speed_benchmark or mobile_speed_benchmark

Reviewed By: Maratyszcza

Differential Revision: D5728534

fbshipit-source-id: 3e912fa63548497ca65ab34c8bb967694c46815b
2017-08-31 17:33:21 -07:00
a1a4924fc0 protect cudnnSetDropoutDescriptor with mutex
Summary: Turns out NCCL can deadlock with cudnnSetDropoutDescriptor, so we need a lock.

Reviewed By: pietern

Differential Revision: D5748325

fbshipit-source-id: b3828c50f6acfc4b5323008ec04f571f6d0d5586
2017-08-31 14:56:07 -07:00
a0bd836afd add conv flops inference
Summary: Added super rough conv cost inference that takes into account very few params

Reviewed By: Maratyszcza

Differential Revision: D5412611

fbshipit-source-id: f662822fd5a532eacb525fbc361e8a62f32430a8
2017-08-31 14:18:21 -07:00
c609f22638 added gflop annotation to TEST_benchmark
Summary: TEST_benchmark will print out gflops if it can infer them

Reviewed By: Maratyszcza

Differential Revision: D5412644

fbshipit-source-id: 3af7bb42cda4684e30db6d8ae5484d441898479c
2017-08-31 14:18:20 -07:00
fc1f117502 Return TensorInferenceFunction for SliceOp
Summary:
It looks like one of the rebases that I have been doing on this op have
completely messed up my code and I have accidentally remove
TensorInferenceFunction for SliceOp. This diff is returning it back.

Reviewed By: akyrola

Differential Revision: D5745305

fbshipit-source-id: 5266c9e14c7d55be5a9cc96688e128db79547b1a
2017-08-31 14:03:47 -07:00
03711e9ab8 Handle bool's correctly in net.Const
Summary: As desc.

Reviewed By: volkhin

Differential Revision: D5745310

fbshipit-source-id: 66c3da37a42cf98bae05cead58f3f694eae19e0d
2017-08-31 12:02:58 -07:00
debceaff02 Support new arguments in ConvTranspose
Summary: Adding support to use kernels, strides, pads etc. as arguments.

Reviewed By: houseroad

Differential Revision: D5710699

fbshipit-source-id: 8b63af4c4a76cd06b637a376aeb29a34c659be2e
2017-08-31 11:17:32 -07:00
b4b89e1bd5 Ability to dequeue and concat multiple records in a single QueueDequeue op
Summary: This will allow to do data reading in small batches and concat the batches later on.

Reviewed By: kennyhorror

Differential Revision: D5739129

fbshipit-source-id: 66a8087e5f9d10d654e367c6111ac90cbf54224e
2017-08-31 10:48:59 -07:00
eed2292123 check for null commonworld in DestroyCommonWorld
Summary: Check for nullptr before closing a common world.

Reviewed By: pietern

Differential Revision: D5746256

fbshipit-source-id: d395bf60d3b7f2c2629761d2b6fd46085683390c
2017-08-31 10:48:57 -07:00
4ec26d23a7 TensorInference function for LengthsSum and such
Summary: Adding missing tensor inference function

Reviewed By: kennyhorror

Differential Revision: D5735119

fbshipit-source-id: 1602b5aeec95f13a3c3c6d3e5417af2712a4dfbb
2017-08-31 09:32:48 -07:00
571b651ef2 Remove redundant tensor inference function
Summary: Both D5695197 & D5691262 implement the tensor inference function for Gather. Keeping only one.

Reviewed By: akyrola

Differential Revision: D5742331

fbshipit-source-id: 1c31427fbfbc87bfec84b8c04851275f45154fcf
2017-08-31 09:17:43 -07:00
d84dbcfb9e add a "clone the source" section 2017-08-31 11:55:23 -04:00
a7e11bddab Added readme for SNPE build and usage
Summary: ..

Reviewed By: asaadaldien

Differential Revision: D5744360

fbshipit-source-id: cae3b4cb86aa75cc5a22225d09e9bfe288920a91
2017-08-30 21:47:01 -07:00
fefd5479a3 Initial implementation of YellowFin algorithm
Summary:
Added YellowFin optimizer to Caffe2.
This implemention is different from the original: It has separate alpha and mu for each parameter and it uses different version of Momentum SGD.
Tests / benchmarks for the optimizer are to be done. Some refactor of the code is to be done before pushing. This is still a working version.

Reviewed By: akyrola

Differential Revision: D5652689

fbshipit-source-id: c10dc0424f47c3051b454aede1d121902cb759a8
2017-08-30 18:53:46 -07:00
5ed5be71b1 YellowFin GPU class and Python optimizer
Summary: YellowFin GPU in .cu file, Python operator in optimizer.py

Reviewed By: asaadaldien, akyrola

Differential Revision: D5727450

fbshipit-source-id: 42a878e5fd35e288e0e6eeaa0bf980a9db96e5a7
2017-08-30 18:32:24 -07:00
0f3a5d3180 Tuning number of parameter servers based on performance estimation job
Summary:
1) Adds monitoring of CPU utilization in trainers and PS's, and report the utilization to global statistics
2) Adds the plan execution time to global stats
3) Uses CPU utilization and network utilization observed from performance estimation job to calculate the optimal number of parameter servers needed for the actual job. The optimal number of parameter server is the minimum number of servers needed while parameter servers are not the bottleneck in execution.

//Note: The calculation assumes that parameter shards are assigned to PS's in a uniform way and accesses to the shards follow a uniform access pattern. In reality, shards' access pattern may be skewed. As a next step, we should monitor shard access pattern in performance estimation job and distribute the shards in the optimal way.//

Reviewed By: sf-wind

Differential Revision: D5674398

fbshipit-source-id: 67a07cb9ed4e4d61ff5e81a0ecfe519b8feb2352
2017-08-30 18:03:59 -07:00
00366ca2d1 Move SliceOp outisde of utility_ops.h
Summary: As desc.

Reviewed By: ajtulloch

Differential Revision: D5713178

fbshipit-source-id: 4c733bfd4ca2e8e2f6650e2ae76ef1e7d09046d4
2017-08-30 18:03:58 -07:00
55ec2cb08c YellowFin CPU class
Summary: .h and .c files with YellowFinOp. .cu and test files will be included in next commits.

Reviewed By: akyrola

Differential Revision: D5724198

fbshipit-source-id: b05b9c047af25f9081641a0fe0cdba2ee74cb04b
2017-08-30 17:02:24 -07:00
33ef5f38a0 Fixed cuda loss op
Summary:
Currently the loss ops are still not on GPU even though ALL strategy is selected.
This diff is to enable it.

Reviewed By: xianjiec

Differential Revision: D5671255

fbshipit-source-id: 033863f171e1f89c8d75430d3af6a1e6d0d2eff2
2017-08-30 17:02:23 -07:00
94edc073ed Added subtraction operator, tested useTextureInput set to True
Summary:
-Added subtraction operator and testing the subtraction precision
-CPU running at 692 ms/iter
-GPU Accelerated running at 160 ms/iter

Differential Revision: D5732310

fbshipit-source-id: 763e6eb62aee2ee2ad0f58cc0655a718ffa07ce1
2017-08-30 17:02:23 -07:00
f293a22f39 Comparing CPU & GPU results for Denoiser Network
Summary: Layer by layer comparison between CPU and GPU verified within 1% scale precision

Differential Revision: D5714594

fbshipit-source-id: f4ddee60c317aeeae4c7f3f9ac299fddf9057761
2017-08-30 17:02:22 -07:00
f103b4d93f Relax dimension constraint in CUDA to 6 for Transpose
Summary: att

Reviewed By: bddppq

Differential Revision: D5739634

fbshipit-source-id: 967d2b8811a619dc57a65943cd3ba1063c998aa3
2017-08-30 17:02:21 -07:00
080fab8f6c Code generator for and high-performance emebding look-up kernels, supporting
Summary:
Code generator for and high-performance emebding look-up kernels, supporting
Sum, WeightedSum, and Mean reducers.
Achieve at least 1.5x speedup on float and over 2x speedup for float16, compared to existing code
These are results on Broadwell, using sparse_lengths_sum_benchmar.par benchmark

Old
==============
[root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par  --iteration 10000
Preparing lookup table. 2017-08-08 00:10:23.101848
Preparation finished. 2017-08-08 00:10:27.955680
I0808 00:10:27.955732 30700 net.cc:177] Starting benchmark.
I0808 00:10:27.955759 30700 net.cc:178] Running warmup runs.
I0808 00:10:27.956367 30700 net.cc:188] Main runs.
I0808 00:10:31.839035 30700 net.cc:199] Main run finished. Milliseconds per iter: 0.388264. Iters per second: 2575.56
I0808 00:10:35.704169 30700 net.cc:233] Operator #0 (indices, Python) 0.0583264 ms/iter
I0808 00:10:35.704210 30700 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.327694 ms/iter
I0808 00:10:35.704213 30700 net.cc:237] Time per operator type:
I0808 00:10:35.704217 30700 net.cc:246]        0.327694 SparseLengthsSum
I0808 00:10:35.704221 30700 net.cc:246]       0.0583264 Python
[root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par  --iteration 10000 --dtype float16
Preparing lookup table. 2017-08-08 00:10:59.047159
Preparation finished. 2017-08-08 00:11:05.140565
I0808 00:11:05.140612 31725 net.cc:177] Starting benchmark.
I0808 00:11:05.140635 31725 net.cc:178] Running warmup runs.
I0808 00:11:05.141104 31725 net.cc:188] Main runs.
I0808 00:11:08.371510 31725 net.cc:199] Main run finished. Milliseconds per iter: 0.323039. Iters per second: 3095.6
I0808 00:11:11.671450 31725 net.cc:233] Operator #0 (indices, Python) 0.0609876 ms/iter
I0808 00:11:11.671489 31725 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.26856 ms/iter
I0808 00:11:11.671494 31725 net.cc:237] Time per operator type:
I0808 00:11:11.671497 31725 net.cc:246]         0.26856 SparseLengthsSum
I0808 00:11:11.671500 31725 net.cc:246]       0.0609876 Python

New (Misha's)
==============
[root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par  --iteration 10000
Preparing lookup table. 2017-08-07 23:44:55.897748
Preparation finished. 2017-08-07 23:45:00.708896
I0807 23:45:00.708945 4178361 net.cc:177] Starting benchmark.
I0807 23:45:00.708971 4178361 net.cc:178] Running warmup runs.
I0807 23:45:00.709444 4178361 net.cc:188] Main runs.
I0807 23:45:03.608551 4178361 net.cc:199] Main run finished. Milliseconds per iter: 0.289909. Iters per second: 3449.36
I0807 23:45:06.536182 4178361 net.cc:233] Operator #0 (indices, Python) 0.0572399 ms/iter
I0807 23:45:06.536224 4178361 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.23512 ms/iter
I0807 23:45:06.536228 4178361 net.cc:237] Time per operator type:
I0807 23:45:06.536232 4178361 net.cc:246]         0.23512 SparseLengthsSum
I0807 23:45:06.536236 4178361 net.cc:246]       0.0572399 Python
[root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par  --iteration 10000 --dtype float16
Preparing lookup table. 2017-08-07 23:45:17.191579
Preparation finished. 2017-08-07 23:45:23.173668
I0807 23:45:23.173715 4179316 net.cc:177] Starting benchmark.
I0807 23:45:23.173743 4179316 net.cc:178] Running warmup runs.
I0807 23:45:23.174090 4179316 net.cc:188] Main runs.
I0807 23:45:24.939749 4179316 net.cc:199] Main run finished. Milliseconds per iter: 0.176564. Iters per second: 5663.67
I0807 23:45:26.698885 4179316 net.cc:233] Operator #0 (indices, Python) 0.0557303 ms/iter
I0807 23:45:26.698923 4179316 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.119794 ms/iter
I0807 23:45:26.698927 4179316 net.cc:237] Time per operator type:
I0807 23:45:26.698931 4179316 net.cc:246]        0.119794 SparseLengthsSum
I0807 23:45:26.698935 4179316 net.cc:246]       0.0557303 Python

Reviewed By: salexspb

Differential Revision: D5582172

fbshipit-source-id: d71f5a55580b734a51b8f30852b75f379acfdaf2
2017-08-30 16:22:11 -07:00
71a87f0645 elementSizeInBytes for types 2017-08-30 15:55:56 -07:00
45e6e71198 Tidy up CMake for NCCL
Summary:
Use HINTS instead of PATHS for find_library so that you can specify
-DNCCL_ROOT_DIR and it will use this NCCL installation regardless of
what else is installed on your system. Also add a path hint to include
the default base path for NCCL 2 libraries.
Closes https://github.com/caffe2/caffe2/pull/1152

Reviewed By: Yangqing

Differential Revision: D5740053

Pulled By: pietern

fbshipit-source-id: 43f0908a63e8a9b90320dece0bbb558827433b48
2017-08-30 15:39:56 -07:00
a03e5cb409 Remind users to submodule update.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-08-30 16:14:38 -04:00
466f0a823a Use external nccl, fixes #2553
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-08-30 16:14:38 -04:00
a7ec5def7b data_parallel_model names fix
Summary: Updated usage of deprecated functions in data_parallel_model.py

Reviewed By: akyrola

Differential Revision: D5738512

fbshipit-source-id: a7767e518da777ece058bcad480e5df1d91e9b42
2017-08-30 12:47:14 -07:00
8f6fa78271 disable cudnn when output_padding >= stride or dilation 2017-08-30 15:46:35 -04:00
4a211314d0 fix shape and correctness bugs in autograd/convolution BackwardBackward 2017-08-30 15:46:35 -04:00
58b7d1c764 remove python convnd function 2017-08-30 15:46:35 -04:00
7ca196c11d enable cudnn transposed dilated 2017-08-30 15:46:35 -04:00
0cf2c37505 refactor nn calls in autograd convolution 2017-08-30 15:46:35 -04:00
e950c44c80 enable dilated transpose and gradgrad nn tests 2017-08-30 15:46:35 -04:00
d13d95c09c dilated/transposed conv in autograd 2017-08-30 15:46:35 -04:00
ae5101c137 Fix range op's GPU
Summary: The GPU op was broken. Copy over the scalar data so that it can be used to construct the output tensor.

Reviewed By: akyrola

Differential Revision: D5733170

fbshipit-source-id: dfc800b9a408eaeb7f9abefbb640e10074204add
2017-08-30 11:47:48 -07:00
1b11ea3934 Change default argument for LRN
Summary: att

Reviewed By: bddppq

Differential Revision: D5736242

fbshipit-source-id: c79f27c3177d5446f8ef2044b5e21e432382b4e7
2017-08-30 10:51:19 -07:00
59540847b1 Include Caffe2 headers before anything else
Summary:
This was a tricky one to debug. After pulling from master, my build
was complaining that certain identifiers in updated source files were
undefined. After building with VERBOSE=1, extracting the compilation
commands, and adding -M, I saw that CMake had included the Caffe2
installation directory as include path. Worse yet, this path had
precedence over the path to the actual source code. The compiler
included older headers when compiling newer source files.

This change forces the path to the Caffe2 source code to take
precedence over all other include paths. The only path that takes
precedence over *that* path is PROJECT_BINARY_DIR, which holds the
headers that are generated at compile time.
Closes https://github.com/caffe2/caffe2/pull/1140

Reviewed By: Yangqing

Differential Revision: D5727133

Pulled By: pietern

fbshipit-source-id: c60c89e82e8b1ab1cfca0907d31b84417788d79b
2017-08-30 10:22:06 -07:00
6e03c5dc1f Ignore gloo when linting.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-08-30 11:54:04 -04:00
7310ebb66f Add gloo submodule.
We make gloo a submodule because it contains submodules itself, and
Git cannot handle subtrees with nested submodules.

Fixes https://github.com/pytorch/pytorch/issues/2426
2017-08-30 11:54:04 -04:00
aaef3a3ed3 Remove gloo subtree, in preparation for gloo submodule addition. 2017-08-30 11:54:04 -04:00
e69063405e Allow param groups to be added to Optimizer dynamically (#2374) 2017-08-30 11:20:58 -04:00
19f6941e7c fix arxiv link to batch-norm paper
Summary: arxiv link to batch-norm paper was broken because dot(.) was included at the end

Reviewed By: zem7

Differential Revision: D5734405

fbshipit-source-id: e037c14091e7f9e415c2f7a3008cbf2bf066e699
2017-08-30 07:51:13 -07:00
501b49074b TestGLConvolution CPU baseline can use precomputed
Reviewed By: Maratyszcza

Differential Revision: D5734588

fbshipit-source-id: e6ef13620b1efaeab95237f4fe1560a653a9199b
2017-08-30 00:24:53 -07:00
7ba8de6771 Add S6 support
Summary: Adding support for integer textures and thus the Galaxy S6 among other devices

Differential Revision: D5695151

fbshipit-source-id: 46514e5aa931f98f8c7c82ec923e7803bcaa9bc0
2017-08-29 21:31:17 -07:00
68842a5510 HPTT
Differential Revision: D5733991

fbshipit-source-id: 40e457cdab5ef444c8a5b36402187f7c7c3be5b3
2017-08-29 21:06:40 -07:00
60c17a34b5 better default settings for CUB
Summary: The default CUB settings led to very slow execution in practice when using "dynamic" memory allocation with C2 (i.e freeing blobs after their use). After some tinkering, I arrived to these numbers, that work with resnet-50 and NVIDIA M40 GPU much better than the origianal defaults. Also made the maximum allocated memory configurable.

Reviewed By: Yangqing

Differential Revision: D5732930

fbshipit-source-id: 9ff34f49d5a3eb138bc6f44c82918731a35325a6
2017-08-29 19:11:08 -07:00
41adebe974 Clear the operator default engines before running operator tests
Reviewed By: akyrola

Differential Revision: D5729024

fbshipit-source-id: f2850d5cf53537b22298b39a07f64dfcc2753c75
2017-08-29 17:47:20 -07:00
e41dd5affe Added USDT probes needed to support QueueSnoop
Summary: Add USDT probes to support QueueSnoop

Reviewed By: pietern

Differential Revision: D5650744

fbshipit-source-id: 94dfcf97e23f7ebf76ac31e3d2240f67f802c924
2017-08-29 15:54:08 -07:00
5315669bd8 Add ShapeInference for ConcatOp (Fixed)
Reviewed By: akyrola

Differential Revision: D5721442

fbshipit-source-id: 64ed35cb4c40f32a5cca29fe9cd04e18a340db4b
2017-08-29 12:18:03 -07:00
d698b29f1b handle reshape gradient in shape inference in special way
Summary:
Reshape op's gradient op will have the original shape stored in a blob. Shape inference won't work directly because shape inference function does not have access to blob contents.
In this case, I think making a special exception in the shape inference system is justified: we store the output of reshape in a reshape-cache, and pass that in a backward pass.

Also include my experimental test script that I used for NeuralMT CNN model.

Reviewed By: asaadaldien

Differential Revision: D5721502

fbshipit-source-id: fdc8ab901d3bee2c4621ee5140a5435e49f4471d
2017-08-29 11:28:05 -07:00
488abdcd6c slice op shape inference
Summary: As titled + test

Reviewed By: jamesr66a

Differential Revision: D5720637

fbshipit-source-id: eae76e587808139fcf06abc0f8345152979815ec
2017-08-29 11:05:24 -07:00
c1b09cd5ab Fix typo in docstring example (#2562) 2017-08-29 11:48:44 -04:00
bc228b2409 auto_gpu:True for ones_like and zeros_like (#2559) 2017-08-29 09:51:36 -04:00
cdae579c22 Fix typos in "Extending PyTorch" (#2558) 2017-08-29 09:39:29 -04:00
bdeafe49ac Hotfix the OSS build error
Summary: fixing master

Reviewed By: ajtulloch

Differential Revision: D5724857

fbshipit-source-id: da7b93e181cf496d59364122234a87edd6775a82
2017-08-28 23:53:18 -07:00
a0204331a8 Control flow operators
Summary:
This diff adds control flow operators in Caffe2 (starting with If, While):
 - Added If operator that executes then/else subnet
 - Branch subnet is executed in a separate isolated workspace, with some of the blobs transparently forwarded from the outer workspace
 - Adding a new NetBuilder subclass to construct nets using new operator
 - NetBuilder also keeps track of outer blob names and automatically sets blob bindings between outer and inner workspace, implementing generic convention on handling local/global variables in blocks

Reviewed By: volkhin

Differential Revision: D5720644

fbshipit-source-id: a674cde0c789f6a6ffdcd9d80159d1e42e49133f
2017-08-28 20:04:43 -07:00
440f1abbdf Merge commit '3b7b923de86675a7f4c37b04db8788ccc4b5b682' 2017-08-28 21:42:43 -04:00
3b7b923de8 Fix grid size for batch cat tensor now that getApplyGrid has been changed. 2017-08-28 21:41:46 -04:00
a71330d13f More efficient broadcasting backward by bailing in all cases if sizes match. (#2556) 2017-08-28 21:38:33 -04:00
bfca30e6f1 shape inference for batchmatmul
Summary: As titled. Direct adaptation of the operator code.

Reviewed By: azzolini

Differential Revision: D5721174

fbshipit-source-id: cc9d4c916d7d79d202a344f29ef384ddc68f4988
2017-08-28 18:31:55 -07:00
7c7603a60e fix FC shape inference
Summary: FC shape inference was broken for non-default axis. Add test.

Reviewed By: asaadaldien

Differential Revision: D5720146

fbshipit-source-id: f36f9cc8477dc61c3b07eeea8ea0702562045c88
2017-08-28 16:08:07 -07:00
5a360c92a6 @allow-large-files [caffe2] update snpe for oss
Summary: oss build

Reviewed By: Yangqing

Differential Revision: D5605281

fbshipit-source-id: 289283d5ce8267a2ba22e6c35f6c4af0d45c439b
2017-08-28 15:32:23 -07:00
4b51adb032 Add tiled vs. batched comparison for models
Summary:
Add tiled vs. batched comparison for models
Add more logging to GLPadImage

Differential Revision: D5718546

fbshipit-source-id: fdd4f0aabc41cb3b86b6f0ccf8e618a15170ceae
2017-08-28 13:55:25 -07:00
3a315dd809 Fix a bug in GLConvolution
Differential Revision: D5701479

fbshipit-source-id: c167277cc509261a4e17cdc99a473aada51bdfd7
2017-08-28 13:55:21 -07:00
898f3f398c Use gemmlowp-based worker pool (spinning + #threads of blocks of work) instead of custom work-stealing impl
Reviewed By: Yangqing

Differential Revision: D5696841

fbshipit-source-id: 84b629d2c1ebd418c75d5da907799e580cc59d1e
2017-08-28 00:46:01 -07:00
5585c265a2 Merge commit 'bd27f0b5a7183bbb42b024f88bd9058842c10f95' 2017-08-27 21:32:08 -04:00
bd27f0b5a7 remove repetition of libquadmath in TH CMakeLists 2017-08-27 21:30:58 -04:00
674e1f2ba1 increase test subprocess timeout 2017-08-27 21:11:08 -04:00
4cca286d9e add google analytics to docs 2017-08-27 20:58:33 -04:00
5a0163cdee Merge commit '80caca4edbc415abb2f0695fb2565e6b46c410a8' 2017-08-26 22:25:01 -04:00
80caca4edb Allowing larger grids for THCApply shows improved performance. 2017-08-26 22:23:50 -04:00
18d6579ee3 Merge commit '72a257584efa7fb63b14f09d19efc96caa5d6e4d' 2017-08-26 22:21:19 -04:00
b42911dd48 Merge commit '429699bb20596e1c8bc87ab37e4597b700eff8f6' 2017-08-26 22:20:55 -04:00
72a257584e Add numerically stable logsigmoid 2017-08-26 22:19:48 -04:00
429699bb20 Add numerically stable logsigmoid 2017-08-26 22:19:39 -04:00
94b5990201 Add torch.cuda.get_device_name function (#2540) 2017-08-26 15:06:37 -04:00
d9f9047e39 Merge commit '16008661bf63da2fbe7fc8412a8ab4dd70deeace' 2017-08-26 14:46:26 -04:00
a9b089c3c8 Merge commit 'b5949d8e9d9e43737979bcc089de2a2d2f783e1d' 2017-08-26 14:46:05 -04:00
5294017d9f Adding implicit padding for 3d average pooling 2017-08-26 14:45:19 -04:00
16008661bf Adding implicit padding for 3d average pooling 2017-08-26 14:44:59 -04:00
b5949d8e9d Adding implicit padding for 3d average pooling 2017-08-26 14:44:49 -04:00
150dc7a8e3 Improve Windows Compatibility(for libshm) (#2455) 2017-08-26 07:20:45 -04:00
d3c8e68004 Revert D5641588: [caffe2] Control flow operators
Summary:
This reverts commit f9e04429961c3da7da4ebca3e8163bfcc2a09ec9

bypass-lint

Differential Revision: D5641588

fbshipit-source-id: bb23b213d08e9c3ea509216fce9367625943d007
2017-08-26 00:07:58 -07:00
9f693b39aa Revert D5711951: [caffe2] Add shape inference for ConcatOp
Summary:
This reverts commit 9173ef0f18af25326ec18e66f6ce29eecfa5ceea

bypass-lint

Differential Revision: D5711951

fbshipit-source-id: 9bbb872eafcbd3c470b782a5ddb2a1c894888101
2017-08-25 23:37:38 -07:00
cc3662e939 Added support for scaling learning rate of Caffe2 optimizers during training
Summary: While there is currently support for scaling the base learning rate when loading the model, there is not support for scaling the base learning rate during training. This is needed for LATTE's seq2seq translation models, as the learning schedule is not predefined and is modified at runtime.

Reviewed By: jhcross

Differential Revision: D5701391

fbshipit-source-id: ae3bec45f238db1a2be7af9c04d720067e9095d5
2017-08-25 19:04:47 -07:00
105d5e595c squeeze op enforce about dim
Summary: rt

Reviewed By: kittipatv

Differential Revision: D5709352

fbshipit-source-id: 7805f26f5f1c0eb941c1c9e85211bbdcc8f2e6b8
2017-08-25 18:30:44 -07:00
26f0943130 Do CaffeCudaSetDevice and CaffeCudaGetDevice
Summary:
These are wrapper functions so that if we run in a Caffe2-only mode, we can
turn the flag on and get some small speedup on cuda device switches.

The purpose of the diff is to allow us to quickly assess the overhead of cuda
device switch functions. Ideally, the caching behavior shall live in the cuda
driver, which is the only safe place to ensure correctness.

If other code is running aside Caffe2 and does not properly do device guard,
this functionality will fail as separate cudaSetDevice() calls will not update
Caffe2's thread local device id. As a result, the functionality is only enabled
when/if one explicitly sets the flag.

This might not be safe, so use with caution.

- cudaGetDevice can go from 90ns to 2ns
- when setting the same device, we can go from 100ns to 2 ns
- when setting a different device, things are the same (1ns overhead on top of 143ns)

Reviewed By: azzolini

Differential Revision: D5709398

fbshipit-source-id: 6255f17a3d41f59a30327436383f306a2287896e
2017-08-25 18:20:14 -07:00
da418f5744 Add shape inference for ConcatOp
Reviewed By: akyrola

Differential Revision: D5711951

fbshipit-source-id: 9173ef0f18af25326ec18e66f6ce29eecfa5ceea
2017-08-25 18:09:35 -07:00
e27431ddf5 New math.h functions required by YellowFin
Summary: New math.h functions requred by YellowFin

Reviewed By: akyrola

Differential Revision: D5695258

fbshipit-source-id: b21a23b7f9647004173f8eb4f8ba9a852370d97a
2017-08-25 18:09:34 -07:00
3c180ba317 Opensourcing channel shuffle
Summary: att

Reviewed By: Yangqing

Differential Revision: D5662540

fbshipit-source-id: 474d7d808841ff8f7ce97b55df836b9d2f4a7629
2017-08-25 16:46:31 -07:00
885d9a7796 fix memonger for RecurrentNetworks
Summary: When we ported to memonger to C++ in D5544219, we forgot to include the special handling of RecurrentNetwork ops. This fixes that and adds a test.

Reviewed By: asaadaldien

Differential Revision: D5692407

fbshipit-source-id: 4e739b5dd6c7298303eee9bfa1aa4d19359eb7b5
2017-08-25 16:01:25 -07:00
5bc52c3223 Adds the master setup plan to the model exporter.
Reviewed By: rayleichen

Differential Revision: D5697246

fbshipit-source-id: d1775e0de3b3080f398350f98659436b3dfbd7b8
2017-08-25 16:01:24 -07:00
432cba6c05 Set up run_every_ms when constructing ExecutionStep
Summary: same as title.

Differential Revision: D5709274

fbshipit-source-id: f88b1325f3e6b948b836cc90f4d9c38a27be28ab
2017-08-25 15:58:29 -07:00
ae0c4c8e66 Respect inplace blobs in InjectCrossDeviceCopies
Summary:
Before this diff, we were not respecting in-place blobs. E.g. if we had:

  with DeviceOption(CPU):
      blob = net.MyOpA([])
  with DeviceOption(CUDA):
      net.MyOpB([blob], [blob])

After the InjectCrossDevicesCopies we would have:

  blob = net.MyOpA([], device=CPU)
  blob_cuda0 = net.Copy([blob], [blob_cuda0], device=CUDA)
  net.MyOpB([blob_cuda0], [blob], device=CUDA)

Basically, we were not respecting inplace blobs. After this diff, we'll keep the inplace blob.

Reviewed By: harouwu

Differential Revision: D5671867

fbshipit-source-id: 6ad68c612dae19d7e1f45f4988d929644100b4d5
2017-08-25 14:57:58 -07:00
327a0793b4 Add missing parameters to tensor docs (#2541) 2017-08-25 17:40:55 -04:00
cffbbfa9e3 Revert D5655753: [Caffe2] better straggler exit procedure
Summary:
This reverts commit ad0c998feeb03bcb0cf4e5127fb3cc7bb00dcedb

bypass-lint

Differential Revision: D5655753

fbshipit-source-id: 2f1d350286d2ee31e8045c9bd03ef1235f1a93ec
2017-08-25 14:23:09 -07:00
3b903e8c68 Fix more MKL build issues
Summary:
Turns out that due to the cmake improvement by lukeyeager , we now no longer rely on compiler flags but on the macros.h file to obtain CAFFE2_USE_MKL. This requires some minor changes in the MKL implementation to properly capture the macro before testing it.
Closes https://github.com/caffe2/caffe2/pull/1124

Reviewed By: jerryzh168

Differential Revision: D5705134

Pulled By: Yangqing

fbshipit-source-id: 6f6ad820cdd826818c12cf5aa344533a9324dbe2
2017-08-25 14:01:01 -07:00
a2a033937b DestroyCommonWorld op
Summary: Add an op to explicitly close common world connections, thus helping propagate closures when errors happen. Requires D5661477.

Reviewed By: pietern

Differential Revision: D5660476

fbshipit-source-id: 85791686691305abd96b082a6f68e4427ba14fbb
2017-08-25 14:01:01 -07:00
86cc7ace93 Control flow operators
Summary:
This diff adds control flow operators in Caffe2 (starting with If, While):
 - Added If operator that executes then/else subnet
 - Branch subnet is executed in a separate isolated workspace, with some of the
   blobs transparently forwarded from the outer workspace
 - Adding a new NetBuilder subclass to construct nets using new operator
 - NetBuilder also keeps track of outer blob names and automatically sets
   blob bindings between outer and inner workspace, implementing generic
   convention on handling local/global variables in blocks

Reviewed By: azzolini

Differential Revision: D5641588

fbshipit-source-id: f9e04429961c3da7da4ebca3e8163bfcc2a09ec9
2017-08-25 12:31:14 -07:00
7eba614503 RNNCell: Initializers interface, simplify _LSTM helper
Summary:
_LSTM helper is a legacy piece we had before all the RNNCell awesomeness landed. Now we need to pull it apart and create separate building blocks that people can use for any RNNs.

Please note changes to a test with double scoping. That should go away once we change RNNCell scoping logic in such a way that each cells ads its own name to the scope for all of its outputs (see another diff: D5613139 )

Reviewed By: jhcross

Differential Revision: D5632276

fbshipit-source-id: 1cb568ab995c4c0b3dd1b4bad2d028e34bded9c1
2017-08-25 12:01:24 -07:00
e5740c53de Update gloo dependency
Summary:
This includes the commit that adds `close()` to gloo::transport::Pair.
Closes https://github.com/caffe2/caffe2/pull/1127

Reviewed By: akyrola

Differential Revision: D5708513

Pulled By: pietern

fbshipit-source-id: 8ef505d48b3bfa1576c068c4e4a29c9a8ed5efc7
2017-08-25 12:01:23 -07:00
82360d8cba shape inference for ReduceFront/Back/Sum/Mean, Gather and Dropout
Summary: These were missing and required for some seq2seq models. Unit tested. The previous implementation of ReduceBackMean shape inference was incorrect, so removed it.

Reviewed By: asaadaldien

Differential Revision: D5691262

fbshipit-source-id: 76f868b298440f988635966a410f0232301ca6c4
2017-08-25 11:31:17 -07:00
50129befb6 Merge commit '7d42fd8423213a50a2ac66c08100eb540c531ea0' 2017-08-25 14:30:52 -04:00
61fae72e5f Merge commit 'e4d15223dcba0fb55e58ae9822f1b97a2f9d97d7' 2017-08-25 14:28:51 -04:00
d72118cfcd Merge commit 'e31ec51ee5333bec15b5ae10d646c21c422ff9fe' 2017-08-25 14:28:09 -04:00
2c07f88ea3 Fix typos. 2017-08-25 14:27:07 -04:00
7d42fd8423 Fix typos. 2017-08-25 14:25:58 -04:00
e4d15223dc Fix typos. 2017-08-25 14:25:28 -04:00
e31ec51ee5 Fix typos. 2017-08-25 14:25:17 -04:00
61e4723132 Fix typos (#2472) 2017-08-25 14:13:38 -04:00
0b95b4c7d1 Merge commit '8a2e69177b91b16f17c898fe6c71b4a3c1f3d6cb' 2017-08-25 14:12:39 -04:00
d281fea9ac Merge commit '7b71abc52ad3ef1bb179e26f22e038a84707c270' 2017-08-25 14:12:01 -04:00
eb58740651 add ones_like and zeros_like 2017-08-25 14:11:04 -04:00
8a2e69177b add ones_like and zeros_like 2017-08-25 14:10:52 -04:00
7b71abc52a add ones_like and zeros_like 2017-08-25 14:10:42 -04:00
c86f8fa746 Merge commit '4bef5f5ff97c0b02b9125caf3e68008573c25dd7' 2017-08-25 14:04:29 -04:00
3b155fa305 Not changing dimension size for expand when target size is -1 2017-08-25 14:04:23 -04:00
4bef5f5ff9 Not changing dimension size for expand when target size is -1 2017-08-25 14:01:53 -04:00
a655e6313e update README with new major contributors and remove redundant sections 2017-08-25 13:47:17 -04:00
523d8af26e CMake helper to deprioritize Anaconda include path
Summary:
I ran into an issue where a subset of packages were found in the
Anaconda path. This path also contained includes for other packages
and the Anaconda path inadvertently took precendence over the intended
include path. The new `caffe2_include_directories` helper is a hacky
attempt to "fix" this by deprioritizing Anaconda paths in the hope
that intended include paths are searched before Anaconda.
Closes https://github.com/caffe2/caffe2/pull/1121

Reviewed By: Yangqing

Differential Revision: D5701819

Pulled By: pietern

fbshipit-source-id: 908284cd4ea6c8167774e4e3fcc4dc0ca8a23110
2017-08-25 10:32:59 -07:00
15e16f6963 More double backwards support for pooling, unpooling, padding (#2516)
* Support double backwards for AdaptiveAvgPool1d and AdaptiveAvgPool2d.

* Support double backwards for ReplicationPad2d, ReplicationPad3d, and ReflectionPad2d.

* Support double backwards for FractionalMaxPool2d.

* Support double backwards for MaxUnpool1d and MaxUnpool2d.

* Circular recursive imports not supported in python 2.

* Address review comments.
2017-08-25 12:28:06 -04:00
9c948c22b5 Fix check_no_size_average tests. (#2532)
* Fix check_no_size_average tests.

* size_average / sizeAverage for non-legacy vs legacy.

* Fix lint.
2017-08-25 12:27:26 -04:00
98a5c99b46 remove debug code 2017-08-25 11:26:02 -04:00
14038fe559 Remove unnecessary if in maybe_view. (#2538) 2017-08-25 11:21:50 -04:00
f250815fa4 Fix bugs caused by flatten_parameters() (#2537) 2017-08-25 11:08:54 -04:00
153c9b0714 Add examples in functional.py and loss.py (#2371)
* Add examples in functional.py

Added examples for F.cross_entropy, F.binary_cross_entropy and F.binary_cross_entropy_with_logits.

* Add ` for PyTorch docs

Added ` for PyTorch docs.

* Add examples in loss.py

Added examples for nn.BCELoss and nn.BCEWithLogitLoss.
2017-08-25 09:44:36 -04:00
0d7d79ad75 Merge commit 'd112cbd7f675a8ffde3a8995ac37c69a4c84e5df' 2017-08-25 07:39:02 -04:00
ecc7579f44 Merge commit 'e4c05c2b5f3dbc121c0cf4bb78d15540412dcd3c' 2017-08-25 07:37:19 -04:00
e4c05c2b5f fix leaking symbols from THNN 2017-08-25 07:36:27 -04:00
b3d2a3574e Merge commit '01adebea1c0cb9aa704e50a9d14507b0fab5939f' 2017-08-25 07:36:00 -04:00
802ddd997d Disable persistent BN for cudnn < 7.0.3 2017-08-25 07:33:24 -04:00
51b60354a5 cudnn 7 grouped convolutions 2017-08-25 07:33:03 -04:00
ec86d0b2ba Updates for CUDA 9 2017-08-25 07:32:05 -04:00
01adebea1c cuda 9 hgemm fix 2017-08-25 07:31:32 -04:00
d112cbd7f6 Updates for CUDA 9 2017-08-25 07:27:25 -04:00
bc93d79967 Updates for CUDA 9 2017-08-25 07:27:16 -04:00
b079469af0 self -> ctx in Extending note 2017-08-25 07:19:20 -04:00
5e0b28e7bd PrependDimOp
Summary:
Split the first dimension of a tensor into 2, the first of which is fixed and given in the argument.
This is used to then split batch into smaller batches and distributed it across workers.

Reviewed By: harouwu

Differential Revision: D5702175

fbshipit-source-id: 02bb93e49bf9db411b516e149c8e647301dd2ca5
2017-08-24 18:52:05 -07:00
20c854d43c Make FC op work with empty batch in cuda
Reviewed By: xianjiec

Differential Revision: D5673458

fbshipit-source-id: d1c950c94173843670ae1fae0e15ff61ca7d6761
2017-08-24 18:52:04 -07:00
b3d85cf6b6 Removed CNMEM git submodule
Summary:
CNMEM was deprecated by commit c59f291 and is not used anymore by
Caffe2. It was superseded by CUB.

The git submodule can now be removed.
Closes https://github.com/caffe2/caffe2/pull/1118

Reviewed By: Yangqing

Differential Revision: D5699492

Pulled By: pietern

fbshipit-source-id: 44627ed038f37c12312889bb27691db426ad122f
2017-08-24 16:36:56 -07:00
d3a6fefe1e fix nnpack mkl header inclusion
Summary: Closes https://github.com/caffe2/caffe2/pull/1123

Reviewed By: ajtulloch

Differential Revision: D5701658

Pulled By: Yangqing

fbshipit-source-id: 8351a5531d5e05204312b77f3614cba0228b1331
2017-08-24 15:46:29 -07:00
813cca85d1 Use CMake HINTS to find CuDNN
Summary:
The PATHS suggestion to find_library is searched after everything
else. By using HINTS, it searches CUDNN_ROOT_DIR much earlier, avoiding
potential conflicts with other paths that have the CuDNN header.
Closes https://github.com/caffe2/caffe2/pull/1122

Reviewed By: Yangqing

Differential Revision: D5701822

Pulled By: pietern

fbshipit-source-id: 3f15757701aff167e7ae2a3e8a4ccf5d96763a0c
2017-08-24 15:35:24 -07:00
14d8c03424 adding backward capability for potrf (Cholesky) (#2386) 2017-08-24 17:18:11 -04:00
7e21e760e6 More cogent error messages during indexing (#2503) 2017-08-24 17:13:03 -04:00
b7a6e823a9 Fix TypeError of prod when BP to GPU tensor (#2353) 2017-08-24 17:09:25 -04:00
7aa6bc516f add "Basics" section to distributed docs (#2433) 2017-08-24 17:07:20 -04:00
6bcbecfb97 fix doc of lr_scheduler (#2280)
* resolves #1991

* fix typo
2017-08-24 17:04:53 -04:00
5c6d543b7a Allow kwarg-only inputs to DataParallel 2017-08-24 17:01:04 -04:00
4c9eff807b better straggler exit procedure
Differential Revision: D5655753

fbshipit-source-id: ad0c998feeb03bcb0cf4e5127fb3cc7bb00dcedb
2017-08-24 12:33:30 -07:00
23209152a9 fix memonger test for open source by checking for cuda support
Summary: This test was failing on non-GPU builds because it refers to operator CopyGPUToCPU. Thanks pietern for catching this.

Reviewed By: asaadaldien

Differential Revision: D5698763

fbshipit-source-id: 0bde0f3e99c58647dba2ea6da4d51938e763d10c
2017-08-24 12:02:38 -07:00
7f4ceb83e3 Relax dimension constraints for weight matrix in FC
Summary: att

Reviewed By: Yangqing

Differential Revision: D5662265

fbshipit-source-id: 893ee2f92debab06117725beeca3199cba565f1e
2017-08-24 11:16:39 -07:00
ad07f5f05d Added norm-based gradient clipping to optimizer library
Summary: Moved code for global norm-based gradient clipping from fb specific workflows (seq2seq) to the open-source caffe2 optimizer library

Reviewed By: jhcross

Differential Revision: D5637453

fbshipit-source-id: 7e73c9a1c97c28a152c188467b27a6449f79242e
2017-08-24 10:17:50 -07:00
c4bc718c4b Fix OpenGLPadImage
Summary: I was assuming left padding == right padding and top padding == bottom padding, but actually they could be different, which results in different output size.

Differential Revision: D5693719

fbshipit-source-id: 32595652231da0cf1ec269dc34fa87df23732328
2017-08-24 10:17:49 -07:00
de903ad208 Implement double backwards for nn.Upsample. 2017-08-24 11:13:39 -04:00
5d09fcd028 Make DistributedDataParallel threads Daemon threads to allow clean process exit (#2524) 2017-08-24 06:32:29 -04:00
3faeb621d3 support id_score_list for Feed
Reviewed By: xianjiec

Differential Revision: D5624894

fbshipit-source-id: 1b2caba9ffcce68f346020485cb1f4edb01ca5e7
2017-08-24 00:32:05 -07:00
d368b59177 logging the blob that has type error
Summary: Currently, it's not easy to track down which tensor is missing type and shape info. Print it out for easier debuggin.

Reviewed By: volkhin, xianjiec

Differential Revision: D5695223

fbshipit-source-id: 7f0be0be777a35bb5a71b3799b29b91f0763c159
2017-08-23 21:21:27 -07:00
409d985d43 Tensor inference function for Gather
Summary: Make Gather more convenient to use in layer model

Reviewed By: xianjiec

Differential Revision: D5695197

fbshipit-source-id: aa0406ea39af5b6980ee6fd3bb11250732caac00
2017-08-23 21:21:26 -07:00
93e12e75df Allow caffe2 to detect if cuda lib has been linked, and also fix oss build error.
Summary: Closes https://github.com/caffe2/caffe2/pull/1114

Reviewed By: pietern

Differential Revision: D5686557

Pulled By: Yangqing

fbshipit-source-id: 6b7245ebbe4eeb025ce9d0fe8fda427a0c3d9770
2017-08-23 18:41:15 -07:00
16549ed92b Scaled training and fetching from the PS
Summary:
Today, the PS's weirdly store the entire embedding and not just their
subsection of it. This was simply an oversight on the part of the original
author and this diff fixes that.

The sparse params are sharded to the PS's and the PS's just store their section
of the embedding. The trainer requests the id's as is from the PS. But the PS
divides the id by the num_of_shards before looking it up in the emdedding table
blob.  This happens on the backward and the forward pass. However, during the
model download part, the PS multiples the embeddings with the num_of_shards
before returning them to the trainer. The upshot is that the trainer does not
know anything about how the embeddings are scaled on the PS. The PS adds extra
divide and multiply steps to achieve that.

2. During estimation time, we allocate just one PS for estimation. So in order
to make all of the embeddings fit on the single PS: We simply additionally
scale the hash table sizes (proportionally and equally for all the sparse
params) such that it fits. This scaling is handled analogously to (1).

Reviewed By: boryiingsu

Differential Revision: D5664093

fbshipit-source-id: 92f501f61566f939c41ce0b614a1b499669f978a
2017-08-23 18:16:03 -07:00
1d83a46b44 Improve float16 support
Summary: The operators were lacking some float16 stuff: Extend ScatterAssign for float16. In addition, introduce a constant fill for float16. This needs to be a separate operator instead of ConstantFill, since the latter is in OSS and hence cannot use the Float16 stuff that is fb specific.

Reviewed By: azzolini

Differential Revision: D5664071

fbshipit-source-id: 5b84f625693b6ddddd8b7a35f1541ae40df49fbe
2017-08-23 16:33:07 -07:00
1955d0797e Added fast path for CUDNN global max pooling
Summary:
This adds a fast path for global max pooling with NCHW. Compared to equivalent ReduceBackMean, this is about 3.5x faster.

Based on D5533059.

Reviewed By: akyrola

Differential Revision: D5681122

fbshipit-source-id: 7a4df934044c7dd01888f095f7dd46654aaf4eae
2017-08-23 16:33:06 -07:00
2de1bc894b move ShapeOp out from utility_ops
Summary: move ShapeOp out from utility_ops

Reviewed By: ajtulloch

Differential Revision: D5686081

fbshipit-source-id: ac1ae50bfa2e36eddd1834839169ba3cdf0722dc
2017-08-23 16:33:06 -07:00
98da4e3a04 pairwise dot product with dot_groups support
Summary: extending pairwise dot-product only between dot_groups

Differential Revision: D5527060

fbshipit-source-id: be5d3178c332e122853a2f9d8da12a880608b0ab
2017-08-23 15:23:36 -07:00
4c69697d2a Distribtued bug fixes. (#2434) 2017-08-23 14:46:52 -04:00
620d3ab714 Do not run operator gpu tests if there is not gpu
Reviewed By: Yangqing

Differential Revision: D5689269

fbshipit-source-id: ca3be27a81ffdfb93c153a8aa75a8b8857a33552
2017-08-23 11:32:41 -07:00
6eeb7e6fd8 Use cast::GetCastDataType to handle "from_type" and "to" arguments
Summary: Also enforce the "from_type" argument is supplied when getting gradient

Reviewed By: Yangqing

Differential Revision: D5684399

fbshipit-source-id: bee955d44a04c44142b2212cff548cea6e08b22f
2017-08-23 10:18:01 -07:00
bbf2c6a084 Fix ConcatDataset docs (#2355)
* Fix ConcatDataset docs

so that sphinx-napoleon parses it right.

* Fix WeightedRandomSampler docs
2017-08-23 09:47:57 -04:00
5e54d9330f hidding statically linked libstdc++ symbols (#2471)
This is a solution for the problem described in this comment:
1d9b10d312 (commitcomment-23678756)

And a solution for the issue #2462
2017-08-23 07:18:21 -04:00
966fdbd93a Add commands to re-build individual libraries. (#2506)
When working on PyTorch dependencies we often want to rebuild only that
dependency and the Python extension. You can now do that by running:

  python setup.py build_thc

to only re-build THC
2017-08-23 07:16:05 -04:00
27bd3df71b Patching EmeddingBag to accept 2D input (#2429)
* Patching EmeddingBag to accept 2D input

* fix for CUDA inputs

* fix lint
2017-08-23 07:12:21 -04:00
008a62b18a DOC fixed Tensor.expand docstring (#2495) 2017-08-23 06:38:55 -04:00
d675c101e9 extend pairwise dot product for non-equal x & y dimension size
Summary: extend pairwise dot product for different number of embeddings on x & y dimensions

Differential Revision: D5663553

fbshipit-source-id: 1743a2c101cb8c0fc1f0f3d89c19530802400ec6
2017-08-23 02:08:20 -07:00
c52fd11f58 Add CUDNN to the gpu devices' default preferred engines
Summary:
The original diff is unlanded as the fbcode-target-determinator tests were not run, recreating a new diff with the same change to trigger the tests.

CUDNN should be almost always faster than the default implementation

Reviewed By: salexspb

Differential Revision: D5637156

fbshipit-source-id: 413a08acba7a83502be6199fcb524ab46f1fd4ce
2017-08-22 23:55:34 -07:00
e33dfe93e4 Update proto definition
Summary: Update Argument's definition to allow direct passing of NetDef

Reviewed By: azzolini

Differential Revision: D5681837

fbshipit-source-id: e6c618bff051f9bbc56075c796aeba0094fa97dd
2017-08-22 19:01:18 -07:00
67a55b81e3 Forward blobs into workspace
Summary:
Better isolation for workspaces to allow forwarding selected blobs
from parent to child workspace, possibly under new names. Used for proper
isolation of subnets (loops, then/else branhes, etc) from outer workspace.

Reviewed By: azzolini

Differential Revision: D5681667

fbshipit-source-id: e61a2c7c98ee2abf1f0761905f4bfae47c201c32
2017-08-22 18:45:56 -07:00
502b43641f More flexible tiling for Conv and ConvTranspose
Summary: With these changes, Conv, ConvTranspose, PRelu, and Relu work with tiling now. The default is still batching.

Differential Revision: D5623321

fbshipit-source-id: 07aa378d24165ec19e751cd79c70dea995003be9
2017-08-22 18:17:40 -07:00
058815955d Add default implementation of __call__ for context manager
Summary: Making it more convenient to wrap code int context

Reviewed By: boryiingsu

Differential Revision: D5680991

fbshipit-source-id: 07b7e4d5aa657184039a7d18192b68fe11c1a570
2017-08-22 17:46:22 -07:00
9507cae9e0 Create MergeIdListsLayer
Summary: We create a layer for MergeIdListsOp

Differential Revision: D5531348

fbshipit-source-id: a2e227e1abda05cefa893fd41a2c3ca997851e25
2017-08-22 17:00:55 -07:00
930acc8e85 CUDA SparseLengthsWeightedSum
Summary: title.

Reviewed By: harouwu

Differential Revision: D5665776

fbshipit-source-id: a8ae1a71a9a21e68172662f38b5f799870b9dcd1
2017-08-22 15:42:02 -07:00
c1356216a2 cmake: generate macros.h with configure_file()
Summary:
Using file(WRITE) caused the file to be rewritten for every CMake
reconfigure, which was causing unnecessary full rebuilds of the project
even when no source files changed.

The new strategy has the added benefit of enforcing that the macros.h file
is always generated correctly. When the main project relies on this
header for macro definitions (instead of relying on add_definitions()),
we can be more confident that the project will build correctly when used
as a library (which is the whole point of the macros.h file).

Upsides:
* No more unnecessary rebuilds
* Higher confidence that the project will compile properly as a third-party library

Downsides:
* Developers need to add an entry to `macros.h.in` whenever they would have added a new definition with `add_definitions()`
Closes https://github.com/caffe2/caffe2/pull/1103

Differential Revision: D5680367

Pulled By: Yangqing

fbshipit-source-id: 4db29c28589efda1b6a3f5f88752e3984260a0f2
2017-08-22 14:22:36 -07:00
5748e7140f Strip Operator Schema in mobile build
Reviewed By: Yangqing

Differential Revision: D5677792

fbshipit-source-id: d29edb26a36b24a46821e13e2d77af0f21571fcd
2017-08-22 13:31:08 -07:00
0e5fcc7ca2 Make Tags a decorator as well
Summary: In case the whole function should be wrapped in certain context, this make it less ugly.

Reviewed By: xianjiec

Differential Revision: D5665253

fbshipit-source-id: ecdc6b1a08e91bae6a4352341f97ee37f3aa677a
2017-08-22 11:01:14 -07:00
e902620620 cmake: relative paths for install()
Summary:
I discovered this while investigating more build-caching issues like https://github.com/caffe2/caffe2/pull/1103.

> If a relative path is given it is interpreted relative to the value of the CMAKE_INSTALL_PREFIX variable.
https://cmake.org/cmake/help/v3.0/command/install.html

This is a non-functional change - it just makes the code a bit easier to read. I verified locally that the resulting install directories are identical.
Closes https://github.com/caffe2/caffe2/pull/1111

Differential Revision: D5677328

Pulled By: Yangqing

fbshipit-source-id: 9bb1bfe85fc0bc54a9b7ce33cc31e45ea061d21e
2017-08-22 09:52:09 -07:00
e37847af92 Test CrossEntropyLoss double backwards. 2017-08-22 11:12:03 -04:00
0390e80a7e Support MarginRankingLoss double backwards. 2017-08-22 11:12:03 -04:00
e27127391d Support double backwards for SoftMarginLoss. 2017-08-22 11:12:03 -04:00
fb7e9583bd Generate no_size_average criterion tests by specifying check_no_size_average=True 2017-08-22 11:12:03 -04:00
22ec5f37ca Support double backwards with parallel nn autograd functions. (#2508) 2017-08-22 03:57:45 -04:00
a32e98b700 Add documentation for std/var unbiased argument (#2509) 2017-08-22 03:45:54 -04:00
440d979075 Optimizations for Caffe2 SinusoidPositionEncodingOp
Summary:
Optimizations for SinusoidPositionEncodingOp to sinusoid position embeddings
more competitive against table based embeddings.
- Removed most calls to std::pow
- Replaced division with multiplication with reciprocal
- Reused computation across examples within a batch

Current speedup with batch size of 16, sequence length of 128 and embedding
size of 512 is about 270x (17k embeddings per second -> 4.7M embeddings per
second). The speedup is very dependent on the batch size; at a batch size of 4
this only gets 1.7M embeddings per second.

Profile: https://pxl.cl/8zf0
Annotated DoRunWithType: P57925031

Reviewed By: jamesr66a

Differential Revision: D5634766

fbshipit-source-id: 0f35bb176164ea547c91de242a0205c5d7adf7cf
2017-08-22 00:04:06 -07:00
65112f3865 code cleanup: separate the several net implementations to separate files.
Summary: TSIA.

Reviewed By: harouwu

Differential Revision: D5670906

fbshipit-source-id: 507e789978144341bf696fb20dc11f3c2d55493b
2017-08-21 22:07:48 -07:00
51d67ecd8c DeviceInference function for NCCLAllreduce
Summary: Not sure it is correct in general, but it works as long as we have one blob per GPU.

Reviewed By: harouwu

Differential Revision: D5671891

fbshipit-source-id: 739475101e9b509bc521e268c5b308faa36800e7
2017-08-21 15:46:50 -07:00
0b363fd9de Add event as a first-class citizen of the OperatorBase interface.
Summary:
This adds Event as a new member object to OperatorBase, hence allowing us to do
async computation more easily. Will send a fix for proper RunAsync() for
SimpleNet.

In principle this should have no functionality change yet - the only difference
is that async_dag net now delegates to the operators for holding the event
objects.

Reviewed By: harouwu

Differential Revision: D5668627

fbshipit-source-id: 55f994074be6b85d6c66f09795dcbe2b93aba300
2017-08-21 13:30:53 -07:00
c535b8098f Add a HPTT path in transpose_op.cc
Summary:
https://arxiv.org/abs/1704.04374 is a simple, stateless library that
implements a high performance tensor transposition abstraction - it's
substantially faster than what we have. I think instead of going through an
engine specialization on the CPU side, we can just add this path, since there's
no value (in terms of state management, etc) for having it separate?

We could cache the plan, but it's so cheap to create in these tests.

Reviewed By: jonmorton

Differential Revision: D5534519

fbshipit-source-id: de2fd64fee11be259656b0f02f42a62b7035e3d3
2017-08-21 12:46:57 -07:00
77c28b7a7c Revert D5607549: [Caffe2] [Mobile] [ULP] QConv impl.
Summary:
This reverts commit dfdd7f78d4c64c1f71e11106c57f2c4007581e48

bypass-lint

Differential Revision: D5607549

fbshipit-source-id: ecfe2d455508cae49607efd31aed79198d225883
2017-08-21 11:46:01 -07:00
6465c14aa1 Temporary crash fix
Summary: Disable mpscnn for 10.0.2 temporarily since I can't reproduce the crash

Reviewed By: ajtulloch

Differential Revision: D5665269

fbshipit-source-id: 2f95ba591099078a0347f7ea7bfa82dc37005228
2017-08-21 11:22:57 -07:00
304f3773d0 QConv impl.
Reviewed By: Yangqing

Differential Revision: D5607549

fbshipit-source-id: dfdd7f78d4c64c1f71e11106c57f2c4007581e48
2017-08-21 10:31:40 -07:00
de24bb4b66 Update readme with docker cmd (#2501)
* update docker command in readme to use pre-built images

* correct spelling of Docker Hub

* Update README.md
2017-08-21 08:52:26 -04:00
d2b8d3f8f7 add slack clarification 2017-08-21 06:12:46 -04:00
c38206d901 add wait event and record for MKLOperator
Summary: This is a patch for the recent change for Events. ajtulloch caught this one.

Reviewed By: harouwu

Differential Revision: D5663317

fbshipit-source-id: 471a24f594583669bcd5bbf2fabaeb5664bd0bb7
2017-08-19 21:30:47 -07:00
0e20a7cb7d ImageInputOp_more_data_augmentation
Summary:
Add more data augmentation to ImageInputOp
1) Inception-style random sized cropping
2) color jittering
3) color lighting

Reviewed By: panshen1

Differential Revision: D5637726

fbshipit-source-id: 45d9cc69eec9f4d48c1607d80ccd89e325961b1a
2017-08-19 14:15:58 -07:00
5c43fcda8d Support params that don’t require grad in DistributedDataParallel (#2464) 2017-08-19 11:22:20 -04:00
c5a9aa027b fix wrong path to ReduceLROnPlateau in docstring 2017-08-19 10:27:58 -04:00
b3536a3a6d Adds checkpoint taskgroups to the online trainer.
Summary:
1. Uses the upload_builder in the offline training.
2. Adds the checkpoint taskgroups to the online trainer.
3. Changes the naming rules so that the model checkpoint has the format of
<directory>/<entity_id>_<snapshot_id>.<node_name>.<snapshot_id>

Reviewed By: rayleichen

Differential Revision: D5665068

fbshipit-source-id: a8103aed2ca195a506174d2a1d50611d2f1d9c35
2017-08-19 04:09:47 -07:00
0f35ec9872 Common Subexpression Elimination
Summary:
A new transform, which combines common subexpressions (where an "expression" is one operator), reducing repeated work.

This version is shippable, but one problem:

This transform will also combine operators which write to external_output, which will make behavior incorrect.

Reviewed By: bwasti

Differential Revision: D5629886

fbshipit-source-id: 2bf9f459e2ca633fddc57de85c9fc75845783099
2017-08-18 16:31:48 -07:00
5d24a4eeef Early design for a general Event abstraction cross-devices.
Summary:
There are ad-hoc efforts on avoiding excessive device synchronizations, such as
async_dag, singlethread_async, etc. This diff aims to provide an early design
for a general Event class, that can achieve the following:

(1) It is device agnostic, essentially using a vtable to do cross device record,
wait and synchronization.
(2) Created new functions WaitEvent and Record in the Context class for
interacting with Events.
(3) Exposed the corresponding WaitEvent and Record functions in the OperatorBase
class as well.

An example use case is that, after potential future refactoring, one can achieve
a real async execution per operator by running

op.WaitEvent(previous_event);
op.RunAsync();
op.RecordEvent(this_op_event);

and the next op can do

next_op.WaitEvent(this_op_event);

Right now, I changed async_dag net implementation so that it uses the general
event design. The old Event class is assimilated to the general Event class and
the old Stream class is now essentially taken over by the Context class itself.

Reviewed By: harouwu

Differential Revision: D5648463

fbshipit-source-id: 58bd84d06e4a9977b0b835110ddb2f18be3b7cbc
2017-08-18 15:46:51 -07:00
d6632a9a05 Adding a range operator similar to np.arange
Summary:
Adding a range operator in the spirit of np.arange. It is an imporant building block for a lot of manipulation functions.

This accepts parameters with the same meaning in the same order as python's range or np.arange (e.g. `(stop)`, `(start, stop)` or `(start, stop, step)`)

Differential Revision: D5616861

fbshipit-source-id: 02622b8bd85ebca125cc881c06fae5b54b7c602a
2017-08-18 14:45:56 -07:00
d617a77433 Add tests for ConcatOp and SplitOp
Summary: The new test ensures 'add_axis' and 'split' arguments work as intended for tensors of various dimensions. Hypothesis should checks various edge cases like zeroes in 'split_info' and 1D input with axis=0, add_axis=1.

Reviewed By: hoangmit

Differential Revision: D5645778

fbshipit-source-id: 061f9511a082da54e5c1bbe53a0e7096af4b8d1b
2017-08-18 14:02:42 -07:00
d44e2dabbf Revert D5653336: [caffe2][PR] Add random input scaling
Summary:
This reverts commit 9c353fbe2bf2207e01bc51d14487de323c68af7b

bypass-lint

Differential Revision: D5653336

fbshipit-source-id: 0f2de8afbc87e82f74d1de1f61c6ad196da32cc5
2017-08-18 13:31:29 -07:00
4905ea898a Add random input scaling
Summary:
Add ability to specify a range for randomly scaling to a new shortest side. For example, for Resnet50 training, one would set `random_scale=[256,480]` in the `ImageInput` operator to resize to a random shortest side in the range [256, 480]
Closes https://github.com/caffe2/caffe2/pull/1106

Differential Revision: D5653336

Pulled By: harouwu

fbshipit-source-id: 9c353fbe2bf2207e01bc51d14487de323c68af7b
2017-08-18 11:50:54 -07:00
623df4adb3 Fix travis tests, by splitting DummyOp to GraphDummyOp and TransformDummyOp
Summary:
Tests shouldn't rely on operators defined in other tests, because there is no guarantee that they will build together.

transform_test and graph_test did this, and this fixes it.

Reviewed By: jerryzh168

Differential Revision: D5657635

fbshipit-source-id: e628fe1791a64bb124cdd8c59e80c0d915bfb281
2017-08-18 11:17:28 -07:00
11a14fd0fd Clarifications on setting up torch.distributed (#2475) 2017-08-18 09:21:04 -04:00
fa8b8a5f07 improve unsorted segment op speed
Summary:
use cub DeviceReduce, improve the speed from 23k to 26k, but still far
from the 100k, when without dedup.

the bottleneck is at UniqueOp

Reviewed By: harouwu

Differential Revision: D5633828

fbshipit-source-id: e96b8f7317d01c5388c072e7dcfe987abcb01b67
2017-08-17 22:16:43 -07:00
1d70a2276d Changes the checkpoint naming rules.
Summary: So far the we format the epoch name with 6 digits, but this is constraining. In order to have consistent naming, we can simply append the epoch to the suffix. Then we will have consistent naming rules for small and for large epoch numbers.

Reviewed By: azzolini

Differential Revision: D5653871

fbshipit-source-id: acdf26a14b731347bb85fe2f33c1b89e2ba83bdd
2017-08-17 22:16:42 -07:00
2c18748c54 Move set_stream_id() to protected field.
Summary:
This does not change any existing code behavior - as part of the event
abstractions, this is a cautious step to reduce the interfaces exposed
from contexts. Nothing else is changing.

Reviewed By: harouwu

Differential Revision: D5656597

fbshipit-source-id: 53c5caf278613e610daf6ad3ca4bb6da73367cfc
2017-08-17 20:34:44 -07:00
23506824b0 CUDA-related updates to the core overhead benchmark
Summary: TSIA

Reviewed By: harouwu

Differential Revision: D5656471

fbshipit-source-id: 59cc63f37d3cd0c34516bc077be9a11055618628
2017-08-17 19:32:08 -07:00
db02fbd9bf Fix stepworkspace sizing
Summary: When forward-only mode, we need only 2 workspaces. Errornously we sized the length of the workspace vector to 2 if it was different than 2. But if it was longer (because the step workspaces was shared by an non-forward-only op), we end up deleting the workspaces. With RNN Executor, this is a problem, because it held a reference to the deleted workspaces. Without RNN executor, we just ended recrearing the nets.

Reviewed By: jhcross

Differential Revision: D5654534

fbshipit-source-id: 1e6276e63453831747fee6a85c5057f01b89fde5
2017-08-17 19:32:07 -07:00
5f612d9740 GPU version of BatchGatherOp
Summary: GPU version of BatchGatherOp.

Reviewed By: azzolini

Differential Revision: D5613593

fbshipit-source-id: 0e4a35b84db852ac2718868a02fa90e7c3d8f1f0
2017-08-17 18:31:10 -07:00
6e22427929 fix tci complaining test - test_load_model_from_checkpoints
Summary:
Travis CI is complaining about test_load_model_from_checkpoints in recent PRs.
E: AssertionError: 'trainer:1/task/GivenTensorInt64Fill:0, a C++ native class of type nullptr (uninitialized).' != array([103])
See for example https://travis-ci.org/caffe2/caffe2/jobs/265665119
Reason unkown yet. First disable this then try to fix it

Reviewed By: Yangqing

Differential Revision: D5655068

fbshipit-source-id: 10949339ec92b0a4c2f0e59246040f1b0510be12
2017-08-17 17:50:42 -07:00
aad748fbae Cannot divide on 0
Summary: Add a small fix so that the divident won't be 0.

Reviewed By: kittipatv

Differential Revision: D5650240

fbshipit-source-id: fe17bdf0595c4ff113428d2bc18bf7c455e85302
2017-08-17 17:50:36 -07:00
57c93435e3 Dedup name in functional layer
Summary:
Before this fix, a functional layer name can appear several time in a
blob and causes confusion. This diff fix this issue.

Reviewed By: kittipatv

Differential Revision: D5641354

fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2
2017-08-17 17:50:34 -07:00
cfbd116966 ApplyTransformIfFaster
Summary:
Implemented ApplyTransformIfFaster

Determine if a transform is faster, then return whichever net is better.

Reviewed By: bwasti

Differential Revision: D5534535

fbshipit-source-id: 509943205b0c454bf30fb01343ac4e88d1441c39
2017-08-17 15:36:51 -07:00
f7ece79949 Add fp16 and tensorcore support to resnet50_trainer
Summary:
Use like `--dtype=float16 --enable-tensor-core`
Closes https://github.com/caffe2/caffe2/pull/1093

Differential Revision: D5634840

Pulled By: harouwu

fbshipit-source-id: 18c1e70236ba5ef8661ff55fb524caae1be19310
2017-08-17 15:16:24 -07:00
5b8e2ad2a6 test_distributed cuda tests don't skip if cuda not available. (#2476)
test_distributed cuda tests don't skip if cuda not available.
2017-08-17 17:45:32 -04:00
692f4e4e3b Disable -Wstrict-aliasing when including cuda_fp16.h
Summary:
The cuda_fp16.h header in CUDA 9 RC triggers this diagnostic.
It is included by cusparse.h as well, so guarding the
inclusion of only cuda_fp16.h is not enough.

Reviewed By: Yangqing

Differential Revision: D5651995

fbshipit-source-id: 4778a8a793761e7a1dbebf3792b85b33a3e26219
2017-08-17 14:15:32 -07:00
fa984af0f9 use create_param() in layers
Summary: These layers were not codemoded

Reviewed By: chocjy

Differential Revision: D5645982

fbshipit-source-id: 4325f77a0f8152dfe6dfdeee59697b25ecb1de35
2017-08-17 13:47:57 -07:00
7fad4be4c6 Device-specific memongering
Summary:
Enforce that blobs don't mix between operators on different GPUs or CPU/GPU. Add test.

+ Fix memonger when no namescope is provided.

Reviewed By: asaadaldien

Differential Revision: D5644708

fbshipit-source-id: 0cb361efd6361b6e2138462584bab6b4de039b5d
2017-08-17 13:31:26 -07:00
4d0fbb0e6f ConcatOp: fix axis check with add_axis.
Summary: when adding a new axis to concatenate along, allow it to be the last axis. For example, concated 1D columns into a 2D matrix with axis=1, add_axis=1.

Reviewed By: hoangmit

Differential Revision: D5622495

fbshipit-source-id: 8d7c8650c198450ccd4f9e1c98e4ea9f40162be0
2017-08-17 13:03:18 -07:00
f388135d3f Layer norm brew wrapper
Summary: Implement a brew wrapper for the LayerNorm op. This adds the scalar weight and bias terms to the op.

Reviewed By: jmp84

Differential Revision: D5595836

fbshipit-source-id: 467b2e1158b0c454a149d4b26c47719826e98752
2017-08-17 11:17:47 -07:00
e45e621b0e Implement layer norm gradient GPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: wickedfoo

Differential Revision: D5594445

fbshipit-source-id: 873643165c958fd5829fa7cf07d5d4b1b8b0ed59
2017-08-17 11:17:46 -07:00
8e8e90f595 IMplement layer normalization backward CPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: jmp84

Differential Revision: D5578306

fbshipit-source-id: 94d262f0317b3ee1b504e0110ad5135afe8350ca
2017-08-17 11:17:46 -07:00
e16c40eb4f Implement layer normalization op forward GPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: wickedfoo

Differential Revision: D5552262

fbshipit-source-id: d0cddb0769623a1b3779e2114c19e6ebc57c0f0d
2017-08-17 11:17:45 -07:00
474c043be5 Implement layer normalization op forward CPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: akyrola

Differential Revision: D5543381

fbshipit-source-id: 1102e568439af6a60aad3b87017d5a997fb7dc16
2017-08-17 11:17:44 -07:00
e89474c496 fix forward_only mode
Summary:
Forward-only mode had broken at some point. Two things: RNNCell did not pass the parameter to recurrent.py and also recurrent.py was broken if forward_only=True after python3 codemod.

Added test to rnn_cell_test to actually check the forward only parameter is passed to prevent future breakage.

Reviewed By: jmp84

Differential Revision: D5639306

fbshipit-source-id: b1bbc39d59c3f3734b2f40a1c2f3740c733e0bd4
2017-08-17 10:19:04 -07:00
a63e7314f3 Adding 1d-2d-3d Schemas for Conv and Pool
Summary: Add Conv and Pool operators with dimensions.

Reviewed By: bddppq

Differential Revision: D5588614

fbshipit-source-id: 2552c40dc3ca180a6ab51817d60f0b85b97885d5
2017-08-17 09:45:54 -07:00
4ca5735753 Allow inplace for spatial_bn_op
Summary: att

Reviewed By: Yangqing

Differential Revision: D5644717

fbshipit-source-id: 1a020fe4ca7028056ce7bebddb7bfd1437998530
2017-08-17 09:18:55 -07:00
ae2aad9c0d Operator to Merge ID_LIST features
Summary:
As an alternative to sharing embeddings, we want to explore merging the ID_LISTs in the net.

This commit adds an operator to merge many ID_LIST features into a single one.

Differential Revision: D5481523

fbshipit-source-id: 446121122a32de5682d5d75a165370bc8d776d03
2017-08-17 01:16:00 -07:00
58838baa75 Remove unused travis scripts
Summary:
The current scripts live at `.travis/`. These files at `caffe2/.travis/` were apparently added by accident in fbe2393cc2.
Closes https://github.com/caffe2/caffe2/pull/1102

Differential Revision: D5648563

Pulled By: Yangqing

fbshipit-source-id: 8a071f78f466a1c0bbe62b720b50bacc425287bc
2017-08-17 01:05:03 -07:00
578adbe9c0 Adios CNMEM. You will be remembered.
Summary:
As part of the cuda 9 move we have decided to deprecate the cnmem path
as it seems to be superceded by cub if one needs a memory pool.
Closes https://github.com/caffe2/caffe2/pull/1104

Differential Revision: D5647672

Pulled By: Yangqing

fbshipit-source-id: 988af5bf63e24efa1b631fd91ddb58e798ffc5c6
2017-08-17 00:05:57 -07:00
b3029df1d0 Added window mode for caffe2 sequence operator
Summary: This can be used for local attention to mask elements outside of a window

Reviewed By: jamesr66a

Differential Revision: D5643677

fbshipit-source-id: 92b33866258ccc7307d5bcf08234610aa3fb152d
2017-08-16 21:34:29 -07:00
a0fe96d7cd Rewrite memonger DAG in C++.
Summary: This diff replaces the main of the memonger for dag algorithm _compute_blob_recycling_for_dag with a c++ implementation.

Reviewed By: akyrola

Differential Revision: D5544219

fbshipit-source-id: 9f868880c8d0eb997ad3dd39433f9d0b9216d303
2017-08-16 16:17:15 -07:00
5fb7853803 Fixes compile errors
Summary:
Seems to be required for CUDA 9 compilation
Closes https://github.com/caffe2/caffe2/pull/1100

Differential Revision: D5642986

Pulled By: harouwu

fbshipit-source-id: 5f934d580152d3d66f7baa71695fb8847ee2c029
2017-08-16 15:12:22 -07:00
661beb3345 Speed-up weight_norm over the right-most dim (#2431)
When weight-normalizing over the right-most dimension, combine all
dimensions to the left into a single dim. This avoids two extra
transposes.
2017-08-16 18:04:18 -04:00
bbcc7d37ca Have Tensor.sort accept descending as only argument (#2329) 2017-08-16 18:01:30 -04:00
30baba7d15 fix typo in docstring 2017-08-16 17:55:39 -04:00
51385b3887 Merge commit '73e0b3f4014b9f5b716eb1216d11f13347207f27' 2017-08-16 17:53:22 -04:00
2579b6b53f Merge commit '98ac4542e0e097cd1b26c62d0ffe7fb37230347c' 2017-08-16 17:52:44 -04:00
0d34a6451a fixing the bug with squeezing a singleton dimension in torch.min and torch.max 2017-08-16 17:51:48 -04:00
73e0b3f401 fixing the bug with squeezing a singleton dimension in torch.min and torch.max 2017-08-16 17:51:41 -04:00
98ac4542e0 fixing the bug with squeezing a singleton dimension in torch.min and torch.max 2017-08-16 17:51:24 -04:00
21d8465d8b Add test for Tensor creation from NumPy on CPU and CUDA 2017-08-16 17:44:58 -04:00
7409e0822b Cuda fixes 2017-08-16 17:44:58 -04:00
f269d3f0b5 Add cuda tensor initialization with array 2017-08-16 17:44:58 -04:00
727942be55 Use proper type for counter 2017-08-16 17:44:58 -04:00
610d9d04e7 Support constructing tensors from arrays of non-matching types 2017-08-16 17:44:58 -04:00
6e1d72998f Merge commit 'ec2863024434b54f339801266a0e8d2d63a418ce' 2017-08-16 17:26:43 -04:00
b797ee04fc Add CUDA version of eye 2017-08-16 17:25:52 -04:00
ec28630244 Add CUDA version of eye 2017-08-16 17:23:28 -04:00
a104dac193 remove unsed code and bring back single benchmark mode
Summary: the old gpu single benchmark mode is lost in recent changes. We still need this mode to benchmark some operators. I also removed some unused ancient code

Reviewed By: azzolini

Differential Revision: D5628501

fbshipit-source-id: c5d2c6c99af18c41bead5d86c46a42f05821e2ff
2017-08-16 14:06:31 -07:00
0985eaf373 Add ability to specify init_method for test_distributed. (#2465)
* Add ability to specify init_method for test_distributed.

* Move init_method specification to test run line.

* Run for gloo tests as well.

* Better status message for gloo test.
2017-08-16 17:04:21 -04:00
1f47a80e88 Caffe2: diagonal fill op
Summary: Caffe2: diagonal fill op

Reviewed By: panshen1

Differential Revision: D4775640

fbshipit-source-id: bb388ffe223e6b153d4cde1fdad6f84a2bb65b0f
2017-08-16 13:05:11 -07:00
30616ee309 Fixes the broken checkpoint test.
Summary:
Since we temporarily disable checkpointing the readers, we need to
rename all the node names in the test to make it pass.

Reviewed By: azzolini

Differential Revision: D5640930

fbshipit-source-id: 1e61be31ddf9b6e28efd2eb8e6e91e63dcd83154
2017-08-16 11:24:50 -07:00
14950a9082 Support session in distributed realtime trainer
Summary:
Convert from PlanDef ProtoBuf into python Plan object by recursively creating
Nets and ExecutionSteps.

Also support running Plan object directly in Session.

Reviewed By: azzolini

Differential Revision: D5608393

fbshipit-source-id: c0ae3b6da743a759af6db3b614a5a3935fe0b34c
2017-08-16 10:28:55 -07:00
a53192e334 Revert D5001637: [Caffe2][RNN] Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This reverts commit 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8

bypass-lint

Differential Revision: D5001637

fbshipit-source-id: 4d6250ae7e66ea0aa635a68d943d552e5db65b69
2017-08-16 03:21:49 -07:00
367106f591 move memory allocators to allocator.{h,cc}
Summary:
(no major functionality change, purely cosmetic)
Closes https://github.com/caffe2/caffe2/pull/1098

Reviewed By: akyrola

Differential Revision: D5638664

Pulled By: Yangqing

fbshipit-source-id: bbae0589ec1afe938a186ccfce9f6ff1a986a5db
2017-08-16 01:35:20 -07:00
453c60ce28 Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This diff adds dependency-aware concurrent/parallel execution of operators in stepnets. For CPU, we use multi-threaded execution. For CUDA, we use multiple streams and cuda events for parallelism and dependency tracking.

Much of the diff is about computing dependency graph, which was quite tricky because we need to also avoid write-races of multiple operators running in multiple timesteps in parallel. Also, recurrent blobs "change name" when passing over timestep ("_prev"), so that needs to be handled as well.

This diff also restores the link-ops that I unlanded earlier.

The performance gain of this diff is very good for CPU (same perf as with static_dag, even better on forward-only). On CUDA, the gains are modest, at least with the sizes i was testing with.

Reviewed By: salexspb

Differential Revision: D5001637

fbshipit-source-id: 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8
2017-08-15 23:55:15 -07:00
49ec942825 Temporarily disables the checkpoints for the readers.
Summary:
The hive reader checkpoints are broken because of D5582328.
This breaks our offline simulator test as well.
This is a temporary fix that disables the checkpoints for readers.

Reviewed By: azzolini

Differential Revision: D5637719

fbshipit-source-id: 4f31ae534cb7e981fcacbb721cbb2420249fad91
2017-08-15 19:36:11 -07:00
1db7a99249 disable travis test for dpm test
Summary:
After this, we should have test going back to all green.
Closes https://github.com/caffe2/caffe2/pull/1058

Reviewed By: harouwu

Differential Revision: D5637495

Pulled By: Yangqing

fbshipit-source-id: ac3ab5a27bc56e3bb08fa81aa8ed186cb7e8832b
2017-08-15 19:17:41 -07:00
f92fdd850d Important typo in resnet50_trainer
Summary: Closes https://github.com/caffe2/caffe2/pull/1092

Reviewed By: Yangqing

Differential Revision: D5637489

Pulled By: harouwu

fbshipit-source-id: 13609a3e14a45e640849268821fd8565fd7aae4d
2017-08-15 19:03:15 -07:00
3a8feb7fb7 Address integer division to make it compatible with py2 2017-08-15 21:12:21 -04:00
255b176f6b Sorted Order and Generalized Pattern Matching
Summary:
Pattern match currently only supports one type of pattern matching: connected components.

It will be useful to sometimes use different algorithms to pattern match, either a subset of the operators in order, or general non-connected subgraphs. While generalized pattern matching can match for all types, it is inefficient to use it when sorted order or connected component suffice.

You can can set the PatternMatchType to be one of the three options (it is connected by default), and Transform will use the associated algorithm.

We will need this for common subexpression elimination - specifically, sorted order matching.

Reviewed By: bwasti

Differential Revision: D5629321

fbshipit-source-id: 2104f2d4384fe4aba06a386881a08ca324f290a6
2017-08-15 18:07:01 -07:00
8592f00ec4 Revert D5633240: Add CUDNN to the gpu devices' default preferred engines
Summary:
This reverts commit 99c45c04bf6a3c19f3f7eb27be1bb89344bc03d4

bypass-lint

Differential Revision: D5633240

fbshipit-source-id: 18d7f040f7a611c072bc7fbbfc4cd74c9f24cd3e
2017-08-15 17:36:05 -07:00
0e419ae1b2 Add CUDNN to the gpu devices' default preferred engines
Summary: CUDNN should be almost always faster than the default implementation

Reviewed By: Yangqing

Differential Revision: D5633240

fbshipit-source-id: 99c45c04bf6a3c19f3f7eb27be1bb89344bc03d4
2017-08-15 15:36:32 -07:00
e95b79a69c Benchmark for embedding generation
Summary:
Adds a benchmark comparing two methods used to generate positional embeddings,
table-based and sinusoid (as in the Transformer paper).

Reviewed By: jamesr66a

Differential Revision: D5625633

fbshipit-source-id: faee2d20ea0c3d9c41479c5114fa010ac49fab24
2017-08-15 14:22:41 -07:00
443a4544d4 Update third_party/gloo
Summary:
Fixes failure in test_synchronization_barrie
Closes https://github.com/caffe2/caffe2/pull/1075

Reviewed By: Yangqing

Differential Revision: D5622791

Pulled By: andrewwdye

fbshipit-source-id: 6a41e74218ae1d4fc4bbb240e1c438a39f844cf2
2017-08-15 14:04:21 -07:00
52befa4802 DataParallelModel: take param_init_net into account in _InferBlobDevice
Summary:
Here is my example:

For static RNN timestep is created as a part of param_init_net. Before DPM assumed that it is CUDA blob by default and it participated in broadcasting causing Copy on line 798 to fail. No device mapping is correct for this blob.

Reviewed By: akyrola

Differential Revision: D5631716

fbshipit-source-id: 28c3eb17ecc3080c95c41d69a60bf7262d3907d4
2017-08-15 12:06:46 -07:00
b09d7c890e Copy-edit sparse constructor docs for clarity.
Basically, it's easy to confuse the dimensions of the index tensor.
This adds some more text which should hopefully clarify the situation.

Fixes #2416.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-08-15 13:36:30 -04:00
606699ef97 add calls to contiguous for cudnn affine_grid 2017-08-15 13:35:35 -04:00
a6ba3581a9 Merge commit '1a68c961ea88fe70cb4fca419e9c4186b75846b3' 2017-08-15 03:01:37 -04:00
1a68c961ea accumulate in accType for reductions over dimensions 2017-08-15 03:00:44 -04:00
08c82f8b9d Merge commit '54a111564147f7621785b0f284f77a7afd22f337' 2017-08-15 02:59:17 -04:00
c55cc743fb move clamped random functions out of cwrap and into TH 2017-08-15 02:58:28 -04:00
54a1115641 move clamped random functions out of cwrap and into TH 2017-08-15 02:58:14 -04:00
763fb5d708 Update documentation to reflect Linear with 2+D inputs (#2410) 2017-08-15 02:55:01 -04:00
fb5e40face Merge commit '3f25232aaba44aa4377c7e5ed670587a72f5886e' 2017-08-15 02:52:54 -04:00
469969e324 Merge commit '4bca77816e8402539917d61ecce239810d7f3d5e' 2017-08-15 02:52:16 -04:00
b3db52fe36 Support __neg__, .neg(), and neg_() for Long, Int, Short tensor types. 2017-08-15 02:51:25 -04:00
3f25232aab Support __neg__, .neg(), and neg_() for Long, Int, Short tensor types. 2017-08-15 02:51:11 -04:00
4bca77816e Support __neg__, .neg(), and neg_() for Long, Int, Short tensor types. 2017-08-15 02:50:55 -04:00
d19ee9c182 Add comments for default value (#2282)
Added comments for default value in conv.py
2017-08-15 02:49:22 -04:00
f9d02903b7 Always copy indices in Embedding._renorm (#2414)
LookupTable_renorm sorts and de-dupes the passed in indices tensor
in-place.

Fixes #2413
2017-08-15 02:46:25 -04:00
5e088da5ba Add DistributedDataParallel to docs
DataParallel was included twice.
2017-08-15 10:01:36 +05:30
c05c500a82 check _grad suffix
Summary:
Memonger had a subtle bug which caused it to recycle "splitinfo" outputs of Concat/Split. That is bad since they are in CPU device, and woult cause them to be realloaced. This caused big slowdown with Kaiming's trainer.

Bug was that we checked for gradients as contaning "_grad" in the name, although we should only allow it as a suffix. Admittedly, this is not elegant to do string checking anyways, but that is how Caffe2 works now.

Reviewed By: asaadaldien

Differential Revision: D5627251

fbshipit-source-id: c12be2323109bf81c3725d8884c7ef024e010bd5
2017-08-14 19:47:59 -07:00
62dcc2feed cudnn conv group support
Summary:
Enable the new convolution group functionality in cuDNN v7
Closes https://github.com/caffe2/caffe2/pull/1079

Differential Revision: D5625074

Pulled By: Yangqing

fbshipit-source-id: 00be025b50161a3bae7e7f09712e4b1adeaffd9f
2017-08-14 19:38:19 -07:00
b0da5bf0fb Use masked_fill rather than casting masks when appropriate. 2017-08-14 16:19:10 -04:00
0b0d2a06f7 Update legacy SoftPlus to add threshold constructor arg. 2017-08-14 16:19:10 -04:00
c92f229aa2 CosineEmbeddingLoss as a new style function. 2017-08-14 16:19:10 -04:00
9bcb9658d5 MarginRankingLoss as new style function. 2017-08-14 16:19:10 -04:00
7aeb837895 Implement HingeEmbeddingLoss double backwards. 2017-08-14 16:19:10 -04:00
1efe38768d Implement KLDivLoss double backwards. 2017-08-14 16:19:10 -04:00
5106ce67bb Implment SmoothL1Loss double backwards. 2017-08-14 16:19:10 -04:00
19d4c37ced Implement MSELoss double backward. 2017-08-14 16:19:10 -04:00
7875c02217 Implement GLU double backwards. 2017-08-14 16:19:10 -04:00
9a243abe5c Implement Softmin double backwards. 2017-08-14 16:19:10 -04:00
988b0d58e6 Implement LogSigmoid double backwards. 2017-08-14 16:19:10 -04:00
0c3a01fe44 Implement SoftShrink double backwards. 2017-08-14 16:19:10 -04:00
8d38c0ee52 Implement Softplus double backwards. 2017-08-14 16:19:10 -04:00
ea9a7823b4 Implement Hardshrink double backwards. 2017-08-14 16:19:10 -04:00
a6cccc8701 Implement RReLU double backwards. 2017-08-14 16:19:10 -04:00
434fa7f694 Reduce memory usage for dot attention
Summary: Title

Differential Revision: D5569996

fbshipit-source-id: c705fc7870ac3e71a071c3f808ac885a82334af2
2017-08-14 12:35:50 -07:00
33383f3912 Reduce overhead of broadcasting when broadcasting isn't required. (#2364)
* Reduce overhead of broadcasting when broadcasting isn't required.

* Fix typo.
2017-08-14 15:00:38 -04:00
ca64190491 Update cub submodule
Summary:
This was updated in 707aed36e89ab9e2041de25166a4930fc4e24ee7 but a
force push into https://github.com/NVlabs/cub made the commit Caffe2
was pointing to unreachable.

cc slayton58 lukeyeager
Closes https://github.com/caffe2/caffe2/pull/1089

Differential Revision: D5621958

Pulled By: pietern

fbshipit-source-id: b1242dc6303a38d3ac9adb37e190084a40a66aa2
2017-08-14 11:27:08 -07:00
ffd9316b03 Use SequenceMask op in attention code for sequence masking
Summary: Use the new SequenceMask op to mask out invalid positions in the attention mechanism rather than using PackSegments and UnpackSegments. This should help us on several fronts, including elision of host<>device copies and using fewer intermediate blobs

Differential Revision: D5619156

fbshipit-source-id: e59c644236cee02f853d8743f9a938fb10adc73b
2017-08-12 19:17:49 -07:00
a985355935 Gradient for SequenceMaskOp
Summary: Implement backward pass for a SequenceMaskOp to replace https://github.com/caffe2/caffe2/blob/master/caffe2/python/attention.py#L54-L72.

Reviewed By: akyrola

Differential Revision: D5618373

fbshipit-source-id: b831fa69f51d9468c858961f922564159e12b46f
2017-08-12 14:34:29 -07:00
0a828768e9 Implement SequenceMaskOp forward pass
Summary:
Implement forward pass for a SequenceMaskOp to replace https://github.com/caffe2/caffe2/blob/master/caffe2/python/attention.py#L54-L72.

This implements two modes: a sequence-length based mode and a matrix triangle mode.

Reviewed By: akyrola

Differential Revision: D5615493

fbshipit-source-id: a2ce4a8e655d9b720049010a7856be052c5567eb
2017-08-12 14:34:28 -07:00
8a5bdc383e Fixes the flaky upload test
Summary:
The LocalSession does not work with the multi-node definitions.
The test becomes flaky because of that. The fix is to create
different LocalSession for each Node(), and run each node
sequentially.

Differential Revision: D5617857

fbshipit-source-id: a8079a90291b4c8b5aa6b471c33c06d18e59976c
2017-08-11 18:58:24 -07:00
cd5275e79f Convert upsampling Functions to new style (#2372) 2017-08-11 21:03:58 -04:00
641e582f31 Fix typo (#2378) 2017-08-11 20:57:26 -04:00
3285dc12c9 Avoid reorder warnings with -Wreorder 2017-08-11 18:41:54 -04:00
404f8ee9b4 Extends the jobrunner to support uploading checkpoints.
Summary:
1. Adds one more step in the JobRunner class to upload checkpoints.
2. Adds one function to return the name of the checkpoint given
the name of the node.

Reviewed By: andrewwdye

Differential Revision: D5597130

fbshipit-source-id: 570a55785e6227859e1115326d6cab077f0e7f72
2017-08-11 14:17:17 -07:00
399fc9fb09 Added Nesterov
Summary: Added Nesterov momentum as an option for BMUF and corresponding tests

Reviewed By: asaadaldien

Differential Revision: D5599888

fbshipit-source-id: 30819c9e689347c8b75daddc7444bea9f54193ae
2017-08-11 13:52:43 -07:00
9372ff7a86 Caffe2: support Tensor in BlobsQueueDB
Summary: Caffe2: support Tensor in BlobsQueueDB

Reviewed By: kevinwilfong

Differential Revision: D5589616

fbshipit-source-id: 66aa6092b6403960c4858abd986771b58be94106
2017-08-11 11:21:14 -07:00
dd5618aa49 Remove unnecessary moves in convolution autograd. 2017-08-11 10:47:26 -04:00
319c46fa1c Vectorize ELU op on CPU
Summary: ##select()##, used previously by the ELU implementation, is not vectorized for vector maps in Eigen. This change switches the ELU cpu implementation to use ##cwiseMin## and ##cwiseMax##, which increases the perf by about 4x.

Reviewed By: Maratyszcza

Differential Revision: D5609370

fbshipit-source-id: 99560a25e0ea2cd35e34aa50c65e53788a6be6b0
2017-08-10 21:52:49 -07:00
85788a0f65 Add TensorCore support
Summary:
Add support for TensorCore convolution and gemm on Volta hardware.

Currently built on top of #1055
Closes https://github.com/caffe2/caffe2/pull/1056

Differential Revision: D5604068

Pulled By: Yangqing

fbshipit-source-id: 100f67e26ed5fabb1dbb31dcd77f7ecb84de4ee7
2017-08-10 20:16:48 -07:00
a7be496fe2 Revert D5589309: modify _LSTM into _RNN to adapt GRU
Summary:
This reverts commit f5af67dfe0842acd68223f6da3e96a81639e8049

bypass-lint

Differential Revision: D5589309

fbshipit-source-id: 79b0a3a9455829c3899472a1368ef36dc75f6e14
2017-08-10 16:42:41 -07:00
b91c2f5064 Make reservoir sampling thread safe
Summary: Guarding reservoir sampling with mutex & fix the bug in counting number of new entries.

Reviewed By: chocjy

Differential Revision: D5503300

fbshipit-source-id: fd6b0bacb71fbab99d6d5df2c72da523fba02847
2017-08-10 15:27:21 -07:00
9c4872f4bc Reservoir sampling with object ID deduplication
Summary: Adding the option to dedup by object ID so that more frequent objects are not present more than once in the reservoir

Reviewed By: chocjy

Differential Revision: D5503109

fbshipit-source-id: e36c3ad8eea134d6c10a4c875fceadc0f843c976
2017-08-10 15:27:20 -07:00
f78af06f1b Features collection with reservoir sampling
Summary: Make the candidate pool less localized

Reviewed By: chocjy

Differential Revision: D5453289

fbshipit-source-id: 848cb7551d7112f6f47f2cf647bb0daca6eff341
2017-08-10 15:27:20 -07:00
5dba88b40b Caffe2 [easy]: Better exception logging in parallel_workers/data_workers
Summary: Instead of printing the exception using print() use traceback.print_exc()  This way you get a stack trace

Reviewed By: jay-mahadeokar

Differential Revision: D5604642

fbshipit-source-id: f8cb67e554305cd2fbed384a4a2040fa2b16e7c0
2017-08-10 15:27:19 -07:00
8d342fc6e2 Sampling random negative based on sparse features
Summary: Avoid labelling objects similar to true positive (according to raw ID features) as negative.

Reviewed By: chocjy

Differential Revision: D5336506

fbshipit-source-id: 05f68f5d0af2a6eb907963d38702f0d6e9b2f99b
2017-08-10 15:27:18 -07:00
4758bd851b rectify args btw. train and translate
Summary: Make the command-line arguments pertaining to model architecture the same as between train.py and translate.py. Also use s() scoping function for all intermediate blobs in attention.py (this is for comatibility with multi-headed attention).

Differential Revision: D5594312

fbshipit-source-id: cadf51d854b5a9174ec913f32c655be2abf111e5
2017-08-10 15:27:18 -07:00
f2dfb40302 Added amplitude argument to SinusoidPositionEncodingOp
Summary: In order to control the absolute scale/magnitude of the output of this op, added a tuning parameter: amplitude

Reviewed By: jamesr66a

Differential Revision: D5596574

fbshipit-source-id: 3b7e316de55cce6fd686da70aa5658ec3e99b070
2017-08-10 15:27:17 -07:00
5bb1e6b817 Allow passing unsymmetric 2d kernels to brew.conv.
Reviewed By: jay-mahadeokar

Differential Revision: D5598523

fbshipit-source-id: 47135a8562f7c720badb2be677cb79730dc417a0
2017-08-10 15:27:16 -07:00
ad84747433 Optimized Tiling Code
Summary: Turned a number of uniform shader variables into constants

Differential Revision: D5596760

fbshipit-source-id: 68004c081c6b9ba2e55f7f74e48a673489c927b1
2017-08-10 15:27:16 -07:00
52fa113774 Sync opengl changes
Summary:
Sync the opengl files to make github up-to-date

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Blame Revision:
2017-08-10 14:06:45 -07:00
d79662088c Remove unnecessary moves, avoid IncRef/DecRef of PyBools. 2017-08-10 14:04:53 -04:00
062673db88 Properly pass saved_for in BatchNorm/Conv as the relevant Backward function.
Previously, these Functions passed themselves, i.e. the saved_for from
ConvForward would be ConvForward.
2017-08-10 14:04:53 -04:00
2f624dfd90 Add AutoGPU guard and properly reference Python args from BatchNormBackwardBackward. 2017-08-10 14:04:53 -04:00
50c208a50b Revert "Fix typos."
This reverts commit 4622b3395276b37e10141fab43ffea33941ca0c2.
2017-08-10 13:57:00 -04:00
7f097f4b82 call gemmStridedBatched for cuda >=8 to avoid calling kernels to set up pointers (#794) 2017-08-10 01:37:10 -04:00
01051334a2 Add CMakeLists.txt files in opengl directory
Reviewed By: Yangqing

Differential Revision: D5594761

fbshipit-source-id: 2282407bd7fc3a8c9019e16d2e77c45e5b71b4d7
2017-08-09 14:39:31 -07:00
e908cf28f4 Docker move
Summary:
Bringing over selected dockerfiles from documentation branch and updated the GPU Dockerfiles to use some of lukeyeager provided docker configurations. Latest docker with CUDA 8.0 and cuDNN 6 can be pulled via `docker pull caffe2ai/caffe2` or built with `ubuntu-16.04-cuda8-cudnn6-all-options/Dockerfile`.
**You must use nvidia-docker instead of docker to run the GPU-enabled dockers.** Tutorial files can be overlaid by building `ubuntu-16.04-gpu-tutorial/Dockerfile`. Supersedes #911. Closes #876. Closes #923.
Closes https://github.com/caffe2/caffe2/pull/949

Reviewed By: Yangqing

Differential Revision: D5510872

Pulled By: aaronmarkham

fbshipit-source-id: 390f5eea1d9ec1a3edda828470b12386ab8a1775
2017-08-09 13:54:17 -07:00
eb85258beb CreateMapOp
Summary: Add operator to create empty map

Reviewed By: xianjiec

Differential Revision: D5454652

fbshipit-source-id: ecad6cc58572b378962af08cf02063ef546ed58f
2017-08-09 13:32:19 -07:00
7b86a34610 modify _LSTM into _RNN to adapt GRU
Summary: GRU is different than LSTM that it only has hidden states but no cell states. So in this case, reusing the code of _LSTM is problematic, as we need to delete the part of creating cell state, and change many other places that use hard-coded 4 (hidden_all, hidden, cell_all, cell) into 2 (hidden_all, hidden). Otherwise GRU will break during the backward pass, when the optimizer tries to apply gradient to each of the parameters, because cell state is never used, so it does not have gradients for the corresponding parameters (i.e., cell_state_w, cell_state_b).

Differential Revision: D5589309

fbshipit-source-id: f5af67dfe0842acd68223f6da3e96a81639e8049
2017-08-09 13:24:45 -07:00
784ba07bf3 updated downloader to use s3 url without a redirect via the vanity url
Summary:
Model downloader was broken after the move on s3 to the vanity url, download.caffe2.ai. Using this as the url base hits a redirect, and will result in the script throwing a 403 error.  Rather than upgrading to urllib2 or putting in a bunch of code to handle a redirect on urllib, we can just use the non-vanity base url.
Closes https://github.com/caffe2/caffe2/pull/1020

Reviewed By: Yangqing

Differential Revision: D5568686

Pulled By: aaronmarkham

fbshipit-source-id: d88a6b3e1b7955835fc03b036dc54dec48316e7f
2017-08-09 12:25:30 -07:00
d4e687d6aa Add NCCL_VERSION_MIN, use v2 API if installed
Summary:
Basic NCCL 2 API support - the same as applied to gloo [here](49586d9556)

/cc Yangqing pietern
Closes https://github.com/caffe2/caffe2/pull/1055

Reviewed By: Yangqing

Differential Revision: D5583234

Pulled By: bwasti

fbshipit-source-id: 3a9ce302649fdab9ce897613b94788c1843262e2
2017-08-09 12:10:03 -07:00
bf18d85945 Clean cmake script options, and add USE_METAL to optionally build ios metal code.
Summary: Closes https://github.com/caffe2/caffe2/pull/1063

Differential Revision: D5591620

Pulled By: Yangqing

fbshipit-source-id: 99a674221413568c3301cf4decb5697d0788dd48
2017-08-09 09:23:22 -07:00
a6bf0ca4da Bump travis osx to 8.3.3
Summary:
This is needed for metal build.

Note that for older xcode (7.3), right now ios build fails due to not having metal headers. We will require xcode 8.0 onwards now.
Closes https://github.com/caffe2/caffe2/pull/1062

Differential Revision: D5591536

Pulled By: Yangqing

fbshipit-source-id: 57fbb9e052629ce6ecc16f1ea5179e3303a10907
2017-08-09 01:21:39 -07:00
1ce95090ca Add support for specifying engine preferences
Reviewed By: Yangqing

Differential Revision: D5460994

fbshipit-source-id: 08a8af699eebec37defc070389a8415b3e81ac16
2017-08-09 00:47:18 -07:00
5e0d434b4b Add build support for opengl and latest nnpack.
Summary:
(1) Changed android-cmake to use Yangqing/android-cmake, which supports NEON fp16.
(2) Added cmake scripts to build opengl.
(3) Updated nnpack to master, and changed the corresponding build files.
Closes https://github.com/caffe2/caffe2/pull/1061

Differential Revision: D5591387

Pulled By: Yangqing

fbshipit-source-id: 1d3f28511d33c09df6ecef5041448ac9a3246601
2017-08-09 00:31:53 -07:00
dba6a32450 Revert #1027
Summary:
TSIA
Closes https://github.com/caffe2/caffe2/pull/1060

Differential Revision: D5590958

Pulled By: Yangqing

fbshipit-source-id: e557eb604e5838255c82c3f59f07f4037cf0a487
2017-08-08 22:50:56 -07:00
218f4506fd Fix CUDA check for gcc > 5.
Summary:
In response to https://github.com/caffe2/caffe2/pull/504 , this PR modifies the gcc compiler check for CUDA slightly. All ABI since [gcc-3](https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html) are compatible with eachother. The check from https://github.com/caffe2/caffe2/pull/504 forced the 'regular' CXX / CC compiler to be set to gcc < 6 but this is not required.

According to the documentation for [FindCUDA](https://cmake.org/cmake/help/v3.0/module/FindCUDA.html), `CUDA_HOST_COMPILER` is set to `CMAKE_C_COMPILER` by default. This PR checks if `CMAKE_C_COMPILER` is too new for CUDA 8 and whether `CUDA_HOST_COMPILER` is set to `CMAKE_C_COMPILER`. It also modifies the message slightly.
Closes https://github.com/caffe2/caffe2/pull/525

Differential Revision: D5590749

Pulled By: Yangqing

fbshipit-source-id: 89f9ea7aecc787d6b74bf794da8aea82fc547ec1
2017-08-08 22:35:04 -07:00
1c0d20d58c add in make uninstall for cmake
Summary:
After sudo make install, it is quite cumbersome to remove the installed files manually.This change allows the user to simply type sudo make uninstall to remove all installed files.
Closes https://github.com/caffe2/caffe2/pull/748

Differential Revision: D5590971

Pulled By: Yangqing

fbshipit-source-id: b354640056c88b9975dd0cf195a6a4d8cad8d0ab
2017-08-08 22:10:07 -07:00
595f1a92e0 Conda packaging for caffe2.
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Closes https://github.com/caffe2/caffe2/pull/1019

Differential Revision: D5590909

Pulled By: Yangqing

fbshipit-source-id: 8aa4bb687555a93b693b7c70198c6708db4da441
2017-08-08 21:55:30 -07:00
7efb83ae52 Require C++11 support with CMake functions.
Summary:
This PR replaces PR #464. It requires C+11 support using the
new CMake variables (`CMAKE_CXX_STANDARD`, `CMAKE_CXX_STANDARD_REQUIRED`,
etc.) when CMake is version 3.1 or above. Otherwise, if CMake is older
(e.g. Ubuntu 14.04) it falls back to using the -std=c++11 flag and
issues a warning.

This PR is based on the comment from Yangqing:
https://github.com/caffe2/caffe2/pull/464#issuecomment-305376923

The corresponding line in cmake/MiscCheck.cmake is removed in order to
reduce redundancy. Another option would be to move the C++11 logic to MiscCheck.cmake.
Closes https://github.com/caffe2/caffe2/pull/1027

Differential Revision: D5590646

Pulled By: Yangqing

fbshipit-source-id: 11ac63fbeaab7a1da02115549e214f9c529f1873
2017-08-08 20:48:38 -07:00
5c77cc8182 Exposing num_workers as parameter and enable recycling activations
Summary: as promised, a separate diff for dpm changes I made in experimental code

Reviewed By: pietern

Differential Revision: D5551304

fbshipit-source-id: 9013aeab6c388b1c415ffb2e36fb8dd6b8cf90b0
2017-08-08 19:48:41 -07:00
1199e3d496 changed a small mistake in cross entropy doc (#2292) 2017-08-08 22:04:19 -04:00
c000d15058 Properly use Py_RETURN_True, Py_RETURN_False in back compatibility warnings. (#2345) 2017-08-08 21:54:20 -04:00
9199c954f1 Fix typo in DistributedDataParallel (#2320) 2017-08-08 21:53:42 -04:00
1ac98b1bce Add documentation for apply (#2327) 2017-08-08 21:53:26 -04:00
9357b8fafc new_criterion_tests is redefined so BCELogitsWithLoss tests don't execute. (#2347) 2017-08-08 21:53:15 -04:00
b96c4e714b Fix build failure on MacOS X with clang-800.0.42.1
Summary:
Signed-off-by: Jammy Zhou <jammy.zhou@gmail.com>
Closes https://github.com/caffe2/caffe2/pull/1047

Differential Revision: D5583196

Pulled By: Yangqing

fbshipit-source-id: 7fe782b6caa14074573fbdacd68f50e16fb85e3f
2017-08-08 18:49:27 -07:00
a2204f0b1e Caffe2: Write CUDA version of OneHot operator
Summary: This diff implements CUDA version of OneHot operator.

Reviewed By: bddppq

Differential Revision: D5578543

fbshipit-source-id: 55b70e8ec6ee34b647b9140fecbba31b6968f403
2017-08-08 18:17:39 -07:00
0cf488295d fix Windows build breaks by LengthsTopKOp
Reviewed By: Yangqing

Differential Revision: D5584020

fbshipit-source-id: f351eaad04fb5319230ebdae5c51d60a7161eff6
2017-08-08 18:06:24 -07:00
ef64a4f6b2 Add conv layer and layer tests
Reviewed By: xianjiec

Differential Revision: D5569206

fbshipit-source-id: ed836315f3ee4d7983da94f2633a3085fe99194d
2017-08-08 10:57:43 -07:00
152d2ae3a8 Implement CUDA version of GRU operator
Summary: Add CUDA version of GRU operator

Reviewed By: jamesr66a

Differential Revision: D5571043

fbshipit-source-id: 332aa64fc8a9116cc33382f2b2907080e58c13b3
2017-08-08 10:57:40 -07:00
190a1dda5b fix thread_pool.h
Summary:
TSIA.

accept2ship

Reviewed By: ajtulloch

Differential Revision: D5581086

fbshipit-source-id: 47220312b751aeef13f87faaae5a55bcd4e147eb
2017-08-08 10:32:08 -07:00
679a586d53 Fix metal build after sync
Summary:
While I was trying to make a quick oss cmakefile, I found that some of the
ios source files are out of sync with the most code changes. This diff should
fix the issues.

I manually ran cmake on the oss side with scripts/build_ios.sh to make sure
things pass.

Reviewed By: ajtulloch

Differential Revision: D5582265

fbshipit-source-id: 2636d353d32fcd8fb7087385b9bbed8476e33e74
2017-08-08 10:18:13 -07:00
9fcf676cfa testing for open-source seq2seq
Summary:
Fix multilayer inference in Caffe2 example seq2seq code. (Rely on LSTMWithAttentionDecoder.apply rather than fixed state indices to determine stepwise decoder output.)

Also assorted updates to bring code in line with changes elsewhere in the codebase, and added unit tests which ensure that training and inference networks generate the same loss, which should make these problems much easier to identify in future.

Reviewed By: jamesr66a

Differential Revision: D5579803

fbshipit-source-id: 6e0f27340d981990ab8d0da58e63793222e7be87
2017-08-08 10:09:41 -07:00
35bbb7bfba THD: add a missing header to fix build failure 2017-08-08 11:08:07 -04:00
4622b33952 Fix typos. 2017-08-08 11:05:38 -04:00
ac3a1328d5 Remove unnecesary .proto files
Reviewed By: Yangqing

Differential Revision: D5577595

fbshipit-source-id: cd234893a1be3807aca3195bb29aab7ecfee2d8a
2017-08-08 07:17:07 -07:00
751198f3b1 move cpp flags earlier (#2325) 2017-08-08 07:22:33 -04:00
e51fec3be0 Update sparse.py (#2336) 2017-08-08 07:16:52 -04:00
5caa42b538 Add ConcatDataset to docs (#2337) 2017-08-08 07:16:04 -04:00
07e745408b revert D5528436
Summary: Users are reporting CUDA illegal access errors happening on some configurations after D5528436 introduced lazy peer connections. Will debug later, but this diff is to revert that change.

Reviewed By: pietern

Differential Revision: D5581673

fbshipit-source-id: ef8e367160a38fc62434d6f5905892db274d9f06
2017-08-07 23:07:50 -07:00
8ad382df3c implement LengthsTopK operator
Summary:
It was reverted previously because of lack of schema for gradient op. Added it back and resend.

difference between this diff and previous reverted diff:
1. added schema for gradient operator
2. change line:95 in kmax_pooling_op.h from CAFFE_ENFORCE to CAFFE_ENFORCE_GE

Reviewed By: xianjiec

Differential Revision: D5568867

fbshipit-source-id: 39813b389a5da803967a561249793afdfce00c58
2017-08-07 18:19:29 -07:00
8af625ede2 Implement gradients for Col2Im and Im2Col operators
Reviewed By: jay-mahadeokar

Differential Revision: D5576385

fbshipit-source-id: a0ca4f704fd861f7cc67079041b1d0772fc66920
2017-08-07 15:51:30 -07:00
ddc1b288bb Improve logic when creating a common world
Summary:
When creating a common world, we would attempt to create
one using an existing common world to save on setup cost. This could cause
unexpected behavior when the backing common world had a shorter
timeout than the world being created. This patch improves this
logic by limiting the usage of a backing world to only ones that
have a long enough timeout.

Reviewed By: andrewwdye

Differential Revision: D5570904

fbshipit-source-id: d3b5073a64381ed068a30dcc461a6ec9ce15ad9c
2017-08-07 15:51:29 -07:00
5ae3865112 Fix build
Summary:
(1) BlobsQueue is causing a gcc error (google search suggeste it was a
bug, but we'll put the implementation in a separate cc file).
(2) Preparing for cuda 9: update cub.
(3) Prepare for cudnn 7: update cudnn rnn op.
(4) Fix an MSVC issue

Reviewed By: sf-wind, jerryzh168

Differential Revision: D5574352

fbshipit-source-id: 230820ce3ceaa32bee8323bdc509de352c93fcf2
2017-08-07 15:34:49 -07:00
a11aa0ab35 remove mpscnn-fb folder for the new contrib/ios sync.
Summary:
The mpscnn-fb folder was intended for our earlier sharing of the MPSCNN code.
Now that we have fully migrated the code, one should check contrib/ios instead.

accept2ship

Reviewed By: ajtulloch

Differential Revision: D5577227

fbshipit-source-id: df3706a272f022ea6e529f38d960bce374f79baa
2017-08-07 15:19:06 -07:00
1449c2c821 long -> int64_t in convolution autograd 2017-08-07 18:16:01 -04:00
95d4561d05 Fix canonical_axis_index_ enforce failure when doing memonger shape inference for RN101
Summary: Add TensorInferenceFunction for ReduceBackMean operator

Reviewed By: akyrola

Differential Revision: D5570720

fbshipit-source-id: f51d6cbec8bf32131ee34a32deff216df372e3a9
2017-08-07 14:53:59 -07:00
647f35e742 Fix SyncAllParamsDistributed for Python 3x
Summary:
In Python 3x dictionary values aren't a list and can't be concatenated to a list
this diff should fix that.

Reviewed By: andrewwdye

Differential Revision: D5576724

fbshipit-source-id: c60441857ceceb9c4a71122d2db5e9abad6d3fc2
2017-08-07 14:23:32 -07:00
42fb87d0b1 L1Distance Row-wise, instead of cumulative
Summary:
The L1Distance operator used to return a single value denoting the L1 of the entire input, instead of a vector for each input value.

This fixes that.

Reviewed By: Yangqing

Differential Revision: D5570385

fbshipit-source-id: fbab0e0c9262ccbdb3af27262b8baacdeb2d0fc9
2017-08-07 14:09:25 -07:00
a1bf14d8e6 Building new randomized sparse nn model
Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable

Reviewed By: chocjy

Differential Revision: D5416489

fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d
2017-08-07 12:48:58 -07:00
e7192c3b91 image_input_op_dense_multi_label
Summary:
To train an image model, we also can use label embedding vector as supervision as opposed to using SoftmaxLoss/SigmoidCrossEntropyLoss.
In such case, the label is a dense vector. This diff enables such use cases.

Reviewed By: panshen1

Differential Revision: D5556203

fbshipit-source-id: 52c61495e02fab457dc2d43e3345d7dbd5580ab7
2017-08-07 12:38:16 -07:00
d072701547 Caffe2: Refactor the core logic from data_workers.py into parallel_workers.py
Summary:
data_workers.py provides a really nice, easy way to run background threads for data input.  Unfortunately, it's restrictive, the output of the fetcher function has to be a numpy array.

I pulled out that core nice thread management into parallel_workers, and updated the classes data_workers to extend those classes.  The main change was refactoring out most of the queue handling logic into QueueManager.

This way parallel_workers can be used to manage background threads without having to use the queue for output.

Reviewed By: akyrola

Differential Revision: D5538626

fbshipit-source-id: f382cc43f800ff90840582a378dc9b86ac05b613
2017-08-07 10:14:08 -07:00
cc2c4d07d6 Always use assertAlmostEqual for floats when crossing python and C boundaries
Summary:
This fixes travis numerical issue.
Closes https://github.com/caffe2/caffe2/pull/1024

Differential Revision: D5571340

Pulled By: Yangqing

fbshipit-source-id: 097e6f91da68cc3eacf21fe109f342e0dddea189
2017-08-06 14:51:11 -07:00
836af7f211 update gloo to master.
Summary:
Per pietern this should fix the gloo travis testing error.
Closes https://github.com/caffe2/caffe2/pull/1023

Differential Revision: D5571334

Pulled By: Yangqing

fbshipit-source-id: 9dfe38fd24830510a2f8e4f39d188c186453a864
2017-08-06 14:51:10 -07:00
02e5367bdd Support a build script for Tizen target
Summary:
There does not exist appropriate build script for Tizen software platform.
This commit is to fix #847.

Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
Closes https://github.com/caffe2/caffe2/pull/877

Differential Revision: D5571335

Pulled By: Yangqing

fbshipit-source-id: 12759a3c0cb274ef93d7127b8185341e087f2bfa
2017-08-06 14:51:09 -07:00
42806d6815 kick fb sync
fbshipit-source-id: 9c08c6da71565c0f3e4df0c6f9aa67125afe2330
2017-08-06 14:37:43 -07:00
e97c04118e CUDA 9 support
Summary:
Adds support for the CUDA 9 toolkit.

Includes new fp16 data type fixes, and changes to warp-synchronous programming. Also updates CUB third-party repo for CUDA 9 support.
Closes https://github.com/caffe2/caffe2/pull/853

Differential Revision: D5548507

Pulled By: Yangqing

fbshipit-source-id: c7fd2edb623f2aa8c67b9a1000efc8f71e6832ab
2017-08-06 11:50:17 -07:00
4d8a8c2e1e Implement dot attention
Summary:
Implement dot attention as described in https://arxiv.org/abs/1508.04025
This saves the computation of weighted encoder outputs in `rnn_cell.py`
When the encoder and decoder dimensions are different, we apply an FC, which corresponds to the general case below Figure 2.
Refactored unit tests.

Reviewed By: jhcross

Differential Revision: D5486976

fbshipit-source-id: f9e9aea675b3b072fbe631bc004199b90a9d95cb
2017-08-06 11:50:16 -07:00
ac6eee1118 Delete duplicate PoolOp cuDNN implementation.
Reviewed By: sf-wind

Differential Revision: D5566421

fbshipit-source-id: 7ccbbd6b6b4f1cd372e8525b6a753c3ab7113c0f
2017-08-06 11:50:15 -07:00
fac241bcbc Caffe2: add a DB that's wrapped around a BlobsQueue as an adapter for data from non-DB interface
Summary:
Caffe2: add a DB that's wrapped around a BlobsQueue as an adapter for data from non-DB interface.

This is useful for bridging the gap between DB interface data processing ops (TensorProtosDBInput, ImageInputOp etc.) and data that's coming from arbitrary Python or the pretty intricate Hive reader.

Reviewed By: akyrola

Differential Revision: D5554560

fbshipit-source-id: 01bb0056410f9ade205367d5fefc721f91f5b629
2017-08-06 11:50:14 -07:00
4ad1dbc189 Strip unnecessary files in xplat/fbcode
Summary:
Now Caffe2 is replicated in three code bases. Some directories
are only for mobile or only for server. Need to strip the
unnecessary files in checkout.
run command to strip the files checked out in mobile
    hg sparse --enable-profile fbandroid/xplat/caffe2/.hgsparse-caffe2-xplat
run command to strip the files checked out in server
    hg sparse --enable-profile fbcode/caffe2/.hgsparse-caffe2-dev

Reviewed By: mzlee

Differential Revision: D5557190

fbshipit-source-id: e41c8edab09d3fafcb0c8e40ebe1c6809388dc02
2017-08-06 11:50:14 -07:00
91c9812dd1 Sync of codebases
This is so that we can do per-commit sync between codebases, removing
the current tech debt of manual syncing.

The code is contributed by various folks: @tulloch for ios, @bwasti for
snpe, @fricc33 and @hlu for opengl, among many others.

@feisun (sf-wind) made the original sync.
2017-08-06 11:27:06 -07:00
1654bc9335 add shape to pass-throughs 2017-08-06 10:54:02 -04:00
d2b61e4db9 Merge commit '95f357ffcfe431c544b5fcfa8df402b1507baca3' 2017-08-04 19:51:48 -04:00
2490f3c955 Merge commit '24c496bdda9e5feae813868d901de67d516cf8e8' 2017-08-04 19:51:12 -04:00
87451fd643 move normal variants to TH/THC 2017-08-04 19:50:23 -04:00
95f357ffcf move normal variants to TH/THC 2017-08-04 19:49:49 -04:00
24c496bdda move normal variants to TH/THC 2017-08-04 19:49:30 -04:00
4599c0c7df Update autograd notes (#2295) 2017-08-05 05:18:05 +05:30
8ce4401f09 documentation nit fix for torch.Tensor.random_ (#2297) 2017-08-05 04:31:15 +05:30
03d856977e Update README to link to NCCL2 2017-08-04 09:44:37 -07:00
4a33f66e27 Update README to link to NCCL2 part 3 2017-08-04 09:44:09 -07:00
d66fb63679 Update README to link to NCCL2 #2 2017-08-04 09:43:29 -07:00
80ae43b443 Update README to link to NCCL2 2017-08-04 09:42:25 -07:00
1baae004bf cuda 7.5 fix for gloo 2017-08-04 06:01:54 -04:00
12f25c8106 Revert D5545533: [pairatt] implement kMaxPooling operator
Summary:
This reverts commit 8378caaac528a71c154067168787ed493bfb0d37

bypass-lint

Differential Revision: D5545533

fbshipit-source-id: a8d9db807f5b22461b21b7589886cf54861e3757
2017-08-04 01:33:29 -07:00
4b80ff89e2 Use softsign op for s=0 in arc-cosine feature map
Summary:
The current implementation for s=0 doesn't support backward pass.
Switching to using pow op instead as a temporary solution.

Reviewed By: jackielxu

Differential Revision: D5551742

fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3
2017-08-03 23:35:11 -07:00
5d721c1c14 Some adjustments for Windows build
Summary:
1. switch the protoc building system from msbuild to cmake
2. set default CMAKE_GENERATE to VS2015
3. set default CMAKE_BUILD_TYPE to Release
4. improve error handling
5. add the generated protobuf include path
6. exclude many optional dependencies from build_windows.bat
Closes https://github.com/caffe2/caffe2/pull/1014

Differential Revision: D5559402

Pulled By: Yangqing

fbshipit-source-id: 019e3a6c3c909154027fa932ce1d6549476b23bb
2017-08-03 17:54:12 -07:00
6648677acf [doc] variable shape error of LSTMCell, GRUCell (#2289) 2017-08-04 06:18:51 +05:30
a95f7aa38b Fixed bug with blob_test
Reviewed By: Yangqing

Differential Revision: D5556434

fbshipit-source-id: 4877872e9b1357a5c5a338ef06c67d6ac409f0a6
2017-08-03 17:22:49 -07:00
977f9644c0 Fix ZeroPad2d backwards with negative pads. 2017-08-04 05:40:31 +05:30
01b6a5a3ea Add boolean type in input2 and input3 for caffe2: Where operator
Summary:
Caffe2: Where operator allows users to specify three inputs:
input1: TensorTypes<bool>
input2: TensorTypes<float, double, int, long, std::string>
input3: TensorTypes<float, double, int, long, std::string>,
which allows users to do the operation:  output = input1? input2:input3
We found that there is a need to add boolean type in input2 and input3 for caffe2: Where operator for customers who want to use boolean tensor for doing logic

Reviewed By: ender-wieczorek

Differential Revision: D5541815

fbshipit-source-id: 55171b242821f5f2c83235f5229a85f8cbe580de
2017-08-03 13:17:06 -07:00
83ba2b1091 Typo correction in CMakelist.txt
Summary: Closes https://github.com/caffe2/caffe2/pull/1010

Differential Revision: D5554930

Pulled By: akyrola

fbshipit-source-id: 7bd93608aeace1baacff00b4c302fc4a5e20a607
2017-08-03 10:54:31 -07:00
d177846dbf Add prefix argument to FileStoreHandler
Summary:
This brings it up to par with how the RedisStoreHandler
works. The store handler configuration does not have to change and
only the run ID parameter changes across runs.

This was inconsistent and came up in https://github.com/caffe2/caffe2/issues/984.

Reviewed By: Yangqing

Differential Revision: D5539299

fbshipit-source-id: 3b5f31c6549b46c24bbd70ebc0bec150eac8b76c
2017-08-03 10:37:26 -07:00
3628cd30f0 initialize peer access only when (might be) needed
Summary:
Currently Caffe2 enables peer access between all 8 gpus, even if only 1 gpu would be used. This adds several seconds to the startup time, but also takes a lot of memory (110 MB per GPU).

This diff makes the peer access initialization "lazy". When GPU X is first used, pairwise peer access is set between GPUs 0 to X-1 with X. A lookup table is used to ensure no double peer access initialization.

Reviewed By: pietern

Differential Revision: D5528436

fbshipit-source-id: 8f3c2c8154291a7d3a99ee2882e4834ef5e38b66
2017-08-03 09:24:14 -07:00
8e1ecb1cfd async sparse length sum op
Summary:
This diff makes SparseLengthsSum(Gradient) Async. It goes through these logics:

1. Adding INDICES to Gradient op input so that we can make it async without device host copies.
2. Registering new 3 input op as gradient for CPU/GPU version of SLS
3. In order to not breaking old nets(they are mostly on cpu), I still register the old 2 input op. So the op schema will not complain when it encounter some old nets that has SLSGradient op in it.

wickedfoo  Sorry this diff might bring you extra work of migrating your optimization effort to this new async gradient op. But we think it is worth it. :(

Reviewed By: dzhulgakov

Differential Revision: D5423188

fbshipit-source-id: 62494a6c52a507c4a4688d5a9e1a2bc720d5370d
2017-08-03 03:04:15 -07:00
a4e6ca6956 Added Sinusoidal Position Encoding Op
Summary: Added caffe2 operator to calculate the sinusoidal position encoding for word embeddings, as described on page 6 in  https://arxiv.org/abs/1706.03762.

Reviewed By: jamesr66a

Differential Revision: D5533024

fbshipit-source-id: 1afb35cd7f9d8c71f2635b853e56b2c840f0bc1f
2017-08-03 01:46:46 -07:00
4a8545e3c6 implement kMaxPooling operator
Summary: used by attention model

Differential Revision: D5545533

fbshipit-source-id: 8378caaac528a71c154067168787ed493bfb0d37
2017-08-03 00:48:34 -07:00
adc5510ecb dynamic embedding
Summary: refactor get_categorical_limit

Reviewed By: xianjiec

Differential Revision: D5459389

fbshipit-source-id: 14a7e07394db52fb090c6923e341c34576fcb6d6
2017-08-03 00:33:18 -07:00
a8695178aa Adding parameter sharing API to Dper2
Summary:
To achive this, I modified the blob name scheme defined in a layer.
Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc
    within the same scope).
Now I change it to scope/fc/w and scope/fc_auto_0/w.
That is, we rely on the uniqueness of the scoped layer name to define
names for blobs.

I also overwrote the create_param method in LayerModelHelper to let it
use the resolved name for blobs given the sharingparameter context.

There are some details such as making the initializer more structured
that I need to finalize.

Reviewed By: kennyhorror

Differential Revision: D5435132

fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455
2017-08-03 00:33:18 -07:00
0b50e078d1 add proper build support for perfkernels
Summary: Closes https://github.com/caffe2/caffe2/pull/972

Differential Revision: D5506606

Pulled By: Yangqing

fbshipit-source-id: d9327e08fc1726bf9b20a8668d06a5be179f45d4
2017-08-02 23:17:04 -07:00
38b42e0421 Improve cuDNN weight layout test 2017-08-03 08:22:55 +05:30
d1ab37a65b Make sure deserialized RNN modules have _data_ptrs too 2017-08-03 08:22:55 +05:30
70c95dbe52 fix Conv3d non-contiguous weight bug 2017-08-02 22:47:09 -04:00
74e5328b03 remove limitations on output_padding in Conv* routines 2017-08-02 22:46:24 -04:00
814b65df4f remove limitations on output_padding in Conv* routines 2017-08-02 22:46:04 -04:00
a565b77791 add 2d and 3d dilated full Convolution 2017-08-02 22:44:59 -04:00
6e6dca001c add 2d and 3d dilated full Convolution 2017-08-02 22:44:44 -04:00
60e7966c1f Fix BatchNorm double backwards when training=False. (#2277) 2017-08-03 05:34:12 +05:30
7c04f11d88 search for ldconfig in /sbin for nccl detection (#2276) 2017-08-03 05:32:21 +05:30
f6585e80d7 if RNN's hx is None, requires_grad=False (#2274)
When the initial hidden states of RNN are ``None'', we don't need to compute their gradients.
2017-08-03 05:29:50 +05:30
0b000952c1 Split batchnorm eval test into cpu and cuda functions. (#2273) 2017-08-03 05:25:05 +05:30
42328b70f7 fix another is_same_size call 2017-08-02 19:53:39 -04:00
7df859871e Added functionality that allows users to store huge blobs
Summary: Added functionality that allows users to store huge blobs of any type not only Tensors. Blob has to be divided into chunks in the same way as Tensor blob.

Reviewed By: kennyhorror

Differential Revision: D5432762

fbshipit-source-id: c171faacd99d209bfae6f9707ebde7c4e23ba3b9
2017-08-02 16:08:09 -07:00
cb1dd21280 adding operator lp_norm to support calculating l1 norm and l2 norm
Summary: Implement operators LpNorm, which is to calculate the Lp norm of a tensor for regularization(p=1or 2) . Currently, there are only operator L1Distance to calculate the l1 distance of two same-shape tenors. We want to make it take only one input and output the l1 loss. We would do the same for l2 loss. We also plan to implement l_{p,q} loss, but have not decided which p and q to take.

Reviewed By: xianjiec

Differential Revision: D5460051

fbshipit-source-id: d67a38fbc94afa52de26d4a53e4d2b7df3c50b6a
2017-08-02 15:09:08 -07:00
ca98c659df Add tests that gradcheck grad sizes match input size and fix advanced indexing
case that fails check.
2017-08-02 17:49:02 -04:00
2a8379847b add reentrancy checking for gradcheck. 2017-08-02 17:49:02 -04:00
eb1ac73184 Remove save_mean/save_var from BatchNorm double backwards, as it's not needed.
These could cause a problem with double backwards because they were std::move'd in
Backward.
2017-08-02 17:49:02 -04:00
677324518d Add InputsCanCrossDevices() to NCCL op schemas
Summary: This silences performance warnings of input blobs crossing devices (by prof_dag).

Reviewed By: prigoyal

Differential Revision: D5548325

fbshipit-source-id: 13aa288f77abdfeab3703664421cb9c32bf31567
2017-08-02 12:50:19 -07:00
ded2a5899e Option to set BN scale and bias initial values
Summary:
Necessary to reproduce setup from 1-hour imagenet paper
Closes https://github.com/caffe2/caffe2/pull/995

Differential Revision: D5547666

Pulled By: akyrola

fbshipit-source-id: cbd4396888b02f32c67e1fe7e53636329de64f1b
2017-08-02 11:38:57 -07:00
ab42a95b6f fast path for CUDNN global average pooling
Summary:
KaimingHe  debugged slow model, and found out that global average pooling was hideously slow, even with CUDNN. Turns out CUDNN pooling op (especially backward pass) is not optimized for global pooling.

This adds a fast path for global average pooling with NCHW. This is about 30x faster than CUDNN with 56 x 56 pooling, Compared to equivalent ReduceBackSum, this is about 3x faster.

I will bootcamp the max pooling.

Reviewed By: asaadaldien

Differential Revision: D5533059

fbshipit-source-id: 2d590693d737fa92184603663031d96f6145f304
2017-08-02 11:10:10 -07:00
b3ca3da4b6 fix type mismatch 2017-08-02 10:18:03 -04:00
0fc2bf26b4 Option to enforce batch size
Summary: This will throw away a few examples. It is desirable to keep batch size constant for full sync data parallel

Reviewed By: dzhulgakov

Differential Revision: D5531788

fbshipit-source-id: e19385401155e731cfc5b25e8e9ea7c16c19d478
2017-08-01 22:29:55 -07:00
c662480ea6 Return empty Struct when get_field has empty input
Summary:
Currently, for `from_column_list` if the input col_names=[], it throws
errors. To solve this issue, we fix the get_field function so that it creates
an empty Struct when empty col_names is given.

Reviewed By: kittipatv

Differential Revision: D5543865

fbshipit-source-id: f6dfa25326e355f8ec24e5542761851a276beeb9
2017-08-01 19:49:47 -07:00
da66f10042 Improve StringJoin operator
Summary:
StringJoin operator converts input array/matrix elements to string then join them to make vector of strings

Changes:
* Support string tensor input
* Support join on 1-axis
* Add unit tests

Differential Revision: D5513705

fbshipit-source-id: 25f96ed3586065c15f845a968c9f8864ca8f5bdf
2017-08-01 19:03:43 -07:00
f484a5fee8 Implement LogSoftmax double backwards (#2270) 2017-08-02 07:17:09 +05:30
0c7ee02c37 Add CUDA implementation of BooleanUnmask and fixed some bugs in the test
Reviewed By: akyrola

Differential Revision: D5405606

fbshipit-source-id: fd755ee2ec3d742597f7f5500f54caa396db4da4
2017-08-01 16:51:40 -07:00
6314c1fc15 Transforms in Python
Summary: Allow the use of apply_transform() in the python API

Reviewed By: bwasti

Differential Revision: D5530483

fbshipit-source-id: 61a6d36fe125c89629fdeea040a717c453d84417
2017-08-01 16:51:38 -07:00
c92559c67f Add ios_base initializer to operator_schema error path
Summary: Hopefully this will stop SIOF warnings. Context https://fb.facebook.com/groups/fbcode/permalink/1437211199649048/

Reviewed By: nbronson

Differential Revision: D5537782

fbshipit-source-id: c32d66d12b69bee65b3084e3bad8e0c6a944bf02
2017-08-01 15:33:13 -07:00
676bedd298 Fixes for Python 3 in caffe2/caffe2/fb/data
Summary: As title

Reviewed By: MisterTea

Differential Revision: D5532387

fbshipit-source-id: 0a51ca40b93cc2eb5371f0b86f2800354cd1939c
2017-08-01 15:22:55 -07:00
60cb55461e Caffe2: Support additional outputs in ImageInputOp
Summary: This allows users to add an arbitrary of additional outputs to ImageInputOp.  These are populated by reading additional TensorProto values from the TensorProtos from the DBReader, and converting them into Tensors.  Similar to labels, only ints and floats are supported, and multiple values are supported.

Reviewed By: panshen1

Differential Revision: D5502019

fbshipit-source-id: 5a8b61b3a8549272a112e8e02cd613d8f9a271ba
2017-08-01 14:36:05 -07:00
3a99698734 include numpy's other 32bit int type
Summary: forgot one :)

Reviewed By: akyrola

Differential Revision: D5534905

fbshipit-source-id: a0e58ca3922ec80f526f7586931ff3da8e9bcffc
2017-08-01 13:53:11 -07:00
5d304a3b49 add gradient for SparseToDenseMask operator
Summary: add gradient for SparseToDenseMask operator

Reviewed By: kittipatv

Differential Revision: D5320792

fbshipit-source-id: 8ee7f1c87e8270ad6077ed197ce9512524069b59
2017-08-01 13:05:03 -07:00
d804c848dd Added O2 flag to default compilation path
Summary: Closes https://github.com/caffe2/caffe2/pull/1000

Differential Revision: D5539652

Pulled By: bwasti

fbshipit-source-id: 8ac9c28d7f61a02dce4705df8ce704f00878ae44
2017-08-01 12:54:25 -07:00
5954211ed9 Fix #997
Summary:
cc phg1024
Closes https://github.com/caffe2/caffe2/pull/998

Differential Revision: D5538341

Pulled By: Yangqing

fbshipit-source-id: 2df69e03c8c94c67628ab8051d2a863e93f49692
2017-08-01 11:21:00 -07:00
f961d6da60 Update SoftmaxOp documentation: input not necessarily 2-D
Summary: Update SoftmaxOp documentation: input not necessarily 2-D

Reviewed By: jamesr66a

Differential Revision: D5535238

fbshipit-source-id: 1beed3b35348737c51cd564d95e9f87e9ba0608a
2017-08-01 10:38:12 -07:00
5cca4cc0f2 Fix blob device inference for LearningRate
Summary: LearningRate inputs are always on CPU.

Reviewed By: kennyhorror

Differential Revision: D5531910

fbshipit-source-id: 88b5a50800e46f2cf0f0a82ea0de1adeec8de6ed
2017-08-01 10:17:30 -07:00
1968e03486 net_printer.to_string() accepts NetDef
Summary: Title.

Reviewed By: kennyhorror

Differential Revision: D5531925

fbshipit-source-id: 8f8961e6ab14d49720f74ec01c197ba9cc3e33ce
2017-08-01 10:17:29 -07:00
3324db447f Caffe2: allow nets that don't use all input in net.ClonePartial
Summary: Caffe2: allow nets that don't use all input in net.ClonePartial

Differential Revision: D5535564

fbshipit-source-id: 0ec8fb3ade4d7d6cd4a702c9c265d9c77f27a627
2017-08-01 10:05:46 -07:00
6d8933b939 improve enforces in SquaredDistanceOp
Summary: Change DCHECK to CAFFE_ENFORCE (so that the problems occurs also on mode/opt) and use the EQ enforce

Reviewed By: asaadaldien, Yangqing

Differential Revision: D5517647

fbshipit-source-id: 4da6eae54abf71114957133df088ae3623d8beaa
2017-08-01 09:09:55 -07:00
aebec91301 Fix serialization of legacy ClassNLLCriterion with ignore_index. 2017-08-01 14:29:33 +05:30
9c1e9d8a9b Update legacy ClassNLLCriterion to add ignore_index. 2017-08-01 14:29:33 +05:30
61c873cc7d Implement SoftMax and NLLLoss double backwards. 2017-08-01 14:29:33 +05:30
e1ca722988 Add comments for default value (#2248)
Added comments for default value in nn.functional
2017-08-01 14:27:46 +05:30
9a6c72891b Move Transform from Contrib to Core
Summary:
In order to pybind, we need transform in core.

It's a basically finished product, with a big test suite. It's safe.

We can begin hooking up observers after this, and I have a diff coming up that pybinds some apply_transform function.

Reviewed By: bwasti

Differential Revision: D5522200

fbshipit-source-id: dea6aa606fc689af84e2533569d1ef348cb5f3f2
2017-07-31 20:38:42 -07:00
d035af1f2c Added general string operator matching a la tensorflow, device option, engine, and argument matching
Summary:
Allows Operators to match their string properties using * and |, to allow an operator to match multiple types.

Also allows device option, engine, and argument matching.

Reviewed By: bwasti

Differential Revision: D5419697

fbshipit-source-id: fe09c7f83a5a2fefe61d79e09ee1d5b755045313
2017-07-31 20:38:37 -07:00
b4fe71925d fix #983 by remove unsupported archs
Summary:
`resize_op.cu` line63 leverages `__ldg` feature of CUDA, which implies `compute_35` as minimum requirement.
Closes https://github.com/caffe2/caffe2/pull/986

Differential Revision: D5534305

Pulled By: Yangqing

fbshipit-source-id: 1bac789c89178211ce2214007787d459f4228f99
2017-07-31 18:38:59 -07:00
9349dab8a0 Full sync of fbcode to fbobjc/fbandroid
Summary:
running ##xplat/caffe2/fb_sync.sh##.
Also add two new core sources to the BUCK file, and add ##createSharedBuffer## to NNPACKConvOp.

Reviewed By: ajtulloch

Differential Revision: D5373061

fbshipit-source-id: c030b2629d2715e1d2776c98715f57e2650922c9
2017-07-31 17:38:38 -07:00
8a396f8dbe fix #985 compilation error due to type mismatch
Summary:
Fix the error during compilation on Win10+CUDA, not sure if it affects Linux and MacOS.
caffe2/operators/top_k_radix_selection.cuh(359): error : a value of type "caffe2::TIndex *" cannot be used to initialize an entity of type "long *"
Closes https://github.com/caffe2/caffe2/pull/992

Differential Revision: D5532399

Pulled By: Yangqing

fbshipit-source-id: 6958ee4f21053f73a0628cf98936931099211749
2017-07-31 16:18:18 -07:00
3c8018b565 Utils to Print accumulate histogram of blobs
Summary:
1. allow PrintOp to print every N
2. add a util function to accumulate hist and print.

Reviewed By: dzhulgakov

Differential Revision: D5437008

fbshipit-source-id: 7dd8e51b20f9daaec6c0a4e69ff6e082fca671e6
2017-07-31 16:04:25 -07:00
e38015756a shape inference for Squeeze
Summary: Add tensor inference function for squeeze, refactor a bit

Reviewed By: asaadaldien

Differential Revision: D5518880

fbshipit-source-id: 5b8cb9154f5f777d4be3612a96d7ed76a9068c0c
2017-07-31 16:04:24 -07:00
82adbde878 pass layer_parameter shape to ps builder if cannot inferred from initializer
Summary:
Feed team uses distributed training and wants to also use transfer learning.

Currently, transfer learning implements by overwriting the layer parameter
initializer. Therefore, PS builder can't infer correctly the parameter shape.

To fix this, add a field 'shape' in `layer_parameter` and set the shape if we
overwrite its initializer.

We also enforce the check of parameter shape between the original initializer
and the loaded blob. (this adds extra cost)

Differential Revision: D5520541

fbshipit-source-id: 80547dbd328b3f6cbfcea0b2daaf4004703dfe81
2017-07-31 16:04:23 -07:00
410f464dd1 provide more information in Declarations.cwrap 2017-07-31 14:52:20 -07:00
8c65b5ab34 multilayer seq2seq
Summary: Several refinements to seq2seq example code, including support for multilayer LSTM.

Reviewed By: jamesr66a

Differential Revision: D5460372

fbshipit-source-id: d2eabf6aa9a5b5df7bbc341fd99c4e7d8322e717
2017-07-31 12:27:51 -07:00
5f8693cc6f Make Context::FinishDeviceComputation throw instead of FATAL
Summary:
We shouldn't LOG(FATAL) in Caffe2 code under any conditions as it's a library.

The case where it failed was a bug in SparseAdaGrad that failed on empty input trying to launch 0-sized CUDA kernel.

Also, the trend for C2 core is in moving from bool to exceptions, so I just moved CAFFE_ENFORCE directly into FinishDeviceComputation. Most of the use cases were already doing that or ignoring the output (bad!).

Reviewed By: akyrola

Differential Revision: D5495913

fbshipit-source-id: 66f382369417a262da69d54470f720e7d04a5cdf
2017-07-31 00:05:10 -07:00
8079abbaf1 fix traversal order
Summary: Memonger did not properly track the number of times a blob output has to be produced before an operator can be visited. Actually I remember fixing this before, but well. This bug was manifested in Priya's model, so thanks prigoyal, and benz's model verifier nicely caught the wrong output.

Reviewed By: asaadaldien

Differential Revision: D5524912

fbshipit-source-id: 10f4d7056b84aba0274a918af508ea043e6026f9
2017-07-30 21:47:48 -07:00
76bb054f2c Scaffolding for perfkernels dispatch of embedding lookup
Summary:
Based on discussion with Misha we're going to go for code-generation for all possible variants:

AVX2/AVX512 (eventually)
embedding type: float16, float32
index type: int32, int64
reducer: sum, weighted sum, mean (with scaling by lengths)
block size: 32, 64, 128

From some simple testing full-loop fusion with prefetching (as opposed to TypedAxpy) gives at least 1.5x performance win, so it is justified.

This just adds scaffolding for perfkernels for the embedding lookup subfunction.

I haven't actually moved the current implementation, because it's more work to refactor current macroses/templates, it's easier and more extensible to do codegen.

Scaffolding is a bit ugly because we don't want to pass templates across translation units and thus it requires explicit names of types in function names. Better suggestions are welcomed.

msmelyan - you'd pretty much need to generate appropriate embedding_lookup_avx2.cc

Reviewed By: Yangqing

Differential Revision: D5505887

fbshipit-source-id: ece489d4fd36e7ddbe71efb890f48ab38acaeaec
2017-07-30 12:34:23 -07:00
8262920b72 Add ATen overload to AutoGPU. (#2234)
* Add ATen overload to AutoGPU.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Use new AutoGPU overload.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-30 09:01:24 +05:30
95bcc09812 Resolve some compiler warnings.
Reviewed By: bddppq

Differential Revision: D5519146

fbshipit-source-id: 6a99ea55af3a89c07b17d0ee02088b5258a2c551
2017-07-29 11:09:27 -07:00
0cd149f06f Add comments for default value (#2242) 2017-07-29 14:14:14 +05:30
e3c45206ec Add a method to run a train net multiple times in layer_test_util.py
Summary: This method runs a train net multiple times therefore enables testing layers with iteration-dependent behavior.

Differential Revision: D5493750

fbshipit-source-id: a7fb967a66f799aaf82acfadc4ecf66e0744da20
2017-07-28 19:56:05 -07:00
43c944acbd Remove dead THPP code that has been replaced with ATen objects. (#2235)
THPP usage is now isolated in THD.
2017-07-29 08:07:41 +05:30
84b9d267dc add warnings about slow data input
Summary: One of my workflows was stuck before everstore/hive data input was experiencing networking issues (No route to host etc.). But it is hard to know this is happening because the errors were logged to stdout. Anyway, added a simple logging to warn if the data workers enqueue thread is not getting new data for over 10 secs.

Reviewed By: panshen1

Differential Revision: D5522816

fbshipit-source-id: a036c4afdfbbafea130a4251c1ca02c138d19a83
2017-07-28 18:21:42 -07:00
bf26a51f91 fix a bug where an uninitialized at::Tensor was passed to createPyObject (#2239) 2017-07-29 06:28:18 +05:30
6530db49bc improve pair_wise_loss operator to support multiple sessions
Summary: The diff adds support for rank_loss operator to support computing loss for multiple sessions (batch).

Reviewed By: kittipatv

Differential Revision: D5515465

fbshipit-source-id: 55a01cd5ad21eaeae82875ad136c392fed0dbb26
2017-07-28 15:12:47 -07:00
80192d3e8d Add rudimentary support for calling a few sparse tensor functions. 2017-07-28 12:38:23 -07:00
930b6b83c5 Update class comment of Context
Summary:
Fixes https://github.com/caffe2/caffe2/issues/988
Closes https://github.com/caffe2/caffe2/pull/989

Differential Revision: D5518437

Pulled By: Yangqing

fbshipit-source-id: 885e6fed2a32eed57c3b3aeb16fe65925406501c
2017-07-28 11:35:20 -07:00
071127cc07 change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs
Summary: There is no need to disable inexpesnive assertions in mode/opt, but it makes it incredible difficult to debug model problems. So changed a bunch of them to CAFFE_ENFORCEs.

Reviewed By: Yangqing

Differential Revision: D5517902

fbshipit-source-id: 9154d0114db159e8136a482fb6508e92084af97a
2017-07-28 11:35:19 -07:00
f2090debb0 Optimized SparseLengthsSum
Summary:
Optimised SparseLengthsSum (fp32) for now
1) Specialized  reducer
2) created fast routine with prefetches, loop unrolling, block specailization and register tiling
3) added more variety of block sizes to segment_ops_test.py

Reviewed By: Yangqing

Differential Revision: D5392472

fbshipit-source-id: 8ed9baf1b12ec05bd391cabb390024e6bc60a6f6
2017-07-28 10:10:25 -07:00
c304d04fc6 Replace thpp::Tensor with ATen Tensor in autograd csrc (#2170) 2017-07-28 10:18:37 -04:00
f1fd4ac7ed Added aarch64 support (#2226) 2017-07-28 11:24:19 +05:30
a41cbdec0e float support for square root divide
Summary: to support an operation needed by D5507205

Reviewed By: xianjiec

Differential Revision: D5512522

fbshipit-source-id: a9b3a668c28eff71d1e106dbbb572184df4a7638
2017-07-27 17:40:40 -07:00
0676dfef2b ExtractPredictorNet should strip gpu_id prefix from step_net
Summary:
The renames were only being applied to the main net, if step_net has an
external input that is part of renames, running the model would fail with 'blob
not found in workspace' error.

Differential Revision: D5511953

fbshipit-source-id: ba262a094c3263978dfe173f2cab00301131b57f
2017-07-27 16:06:47 -07:00
13569c9aa0 Fixing semi-random layer model for multi-layer models
Summary:
Updated the semi-random layer model for multi-layer models using semi-random layers.

Notable changes:
- Input and outputs for the semi-random layer is now a Struct with "full" and "random" components
- Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer)

Reviewed By: chocjy

Differential Revision: D5496034

fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04
2017-07-27 15:25:19 -07:00
78d1806679 fixed invalid memory access, caused by iterator invalidation
Summary: ASAN caught invalid memory problems in 3 of the tests in PatternNetTransformTests. The cause was pushing elements into a vector, that although will remain the same when in scope, can be relocated when resized; thus invalidating the iterator pointer.

Reviewed By: bwasti

Differential Revision: D5510112

fbshipit-source-id: affb11dbd221c826e108136789ef11c96c5d9843
2017-07-27 14:28:11 -07:00
26645154bb warn about using test/val model with init_params=True + fixed some cases
Summary: It is common mistake to create test/validation model with init_params=True. When its param_init_net is run, it will overwrite training models' params, and with DPM, those won't be synchronized to all GPUs. I don't want to make this an assertion yet, since it might break people's trainers (it is ok to have init_params=True if you never run the param_init_net...).

Reviewed By: asaadaldien

Differential Revision: D5509963

fbshipit-source-id: 63b1a16ec0af96e3790e226850f6e0e64689143f
2017-07-27 13:20:27 -07:00
be7dcccdd9 fix issues where scale gets reported as 0.0000 in output 2017-07-27 11:24:12 -07:00
ac76ab5fca Increase tol. for float tensor qr big test.
test_FloatTensor_qr_big test is still a bit flaky on K80. Increasing tolerance to improve reliability as tests are moved around and results change for this test.
2017-07-27 14:23:06 -04:00
af1e45c1e1 support appending net and converting them
Summary:
As per rushabhmshah99 request: he wants to append a pre-trained model (without training that) to the model.
So added data_parallel_model.ConvertNetForDevice() to enable that. The unit test shows example how to use this with
AppendNet, and I also added a blurb to the function.

Differential Revision: D5503335

fbshipit-source-id: b2a5db5c1739dc97f46dd0d7606ed555d99255b8
2017-07-27 11:07:48 -07:00
d8443b8ffa BatchGatherOp
Summary:
1. added BatchGatherOp and BatchGatherGradientOp
2. unit tests

Reviewed By: xianjiec

Differential Revision: D5443965

fbshipit-source-id: bdcbb7f9f91c55484372a4bdb1727ae6d49e2018
2017-07-27 10:17:42 -07:00
a53f4b0f9b add dimension check to NHWC2NCHW shape inference
Summary: To prevent assertion from protobuffer when accessing the dims.

Reviewed By: asaadaldien

Differential Revision: D5504362

fbshipit-source-id: d9b55fab3126e2760a3e790615ed30a1af2ddc32
2017-07-27 09:54:44 -07:00
04f31aa034 Improve Variable.retain_grad 2017-07-27 20:36:14 +05:30
ae59e008cd add retain_grad method, to variable, so gradient gets stored during backpop, on non-user variables 2017-07-27 20:36:14 +05:30
e25b3d7bc5 replace lon glong types with size_t (#1267)
Work around bug in msvc compiler in win32 mode
2017-07-27 19:13:56 +05:30
ba1ae2136e strengthen gloo_test by checking for success
Summary: Weakness in gloo_test led to an embarrassing diff review (D5494956): my test "succeeded", alhough each of the workers failed hard in an assertion. This was not handled because there was no exception to be caught and put into the result queue. So change the logic to put a success-token into the queue, signaling successfully completion.

Reviewed By: pietern

Differential Revision: D5503760

fbshipit-source-id: f2415bcc55638595cefa5d64dea811d86e77f24d
2017-07-26 23:37:15 -07:00
8a156b651b Move cpuid ctor to .cc
Summary: As Dima suggested.

Reviewed By: dzhulgakov

Differential Revision: D5504045

fbshipit-source-id: 3fcf40ebfbcd2aebf05e79078630f04748944799
2017-07-26 23:37:14 -07:00
3363681304 enable CreateCommonWorld to bootstrap from existing common world
Summary: Use romain-intel's ContextFactory to create common worlds from existing common worlds, thus bypassing KV store completely. Changed data_parallel_model to automatically find if there is already a CW we can work. CreateCommonWorldOp takes optional second parameter, which is existing CW.

Reviewed By: andrewwdye

Differential Revision: D5494956

fbshipit-source-id: 5f7a840bcd5fe4ea756fafeacc746bc2cf5078b0
2017-07-26 22:31:55 -07:00
346ff7ed18 Implementation for Pattern Net Transforms, which is a transform initialized by a Pattern NetDef and a Replace NetDef.
Summary: Split this into its own file for ease of reviewing. This is a simple interface for someone to create a Transform - by simply providing their own Pattern and Replace NetDefs.

Reviewed By: akyrola

Differential Revision: D5440426

fbshipit-source-id: dc643226f40ffe4ec5c86d56cfea374bd6a4e0e5
2017-07-26 22:08:00 -07:00
de92dbe4bb MKL code move
Summary:
Nothing gets changed - this would allow us to more easily deal with build
systems. Also now everything that is MKL related lives under mkl/.

Reviewed By: dzhulgakov

Differential Revision: D5505157

fbshipit-source-id: ddb2e6ac290a146a7cb495da23bb0e5b5594bd2a
2017-07-26 20:21:55 -07:00
1eecabcfb5 bug in mtml: shared_embedding
Summary:
A bug reported in MTML group: https://fburl.com/lumicchc

The reason is that in MTML, the `task_shared_embedding` was not correctly
initalized in python

Reviewed By: xianjiec

Differential Revision: D5502875

fbshipit-source-id: 3538d917392568ecd37c39059dc86f866bce9543
2017-07-26 20:21:53 -07:00
45ce863151 CMake updates.
Summary: Closes https://github.com/caffe2/caffe2/pull/970

Differential Revision: D5505960

Pulled By: Yangqing

fbshipit-source-id: 1843c83d4ab5f9f3880bf93a9c748717c6af8565
2017-07-26 18:58:20 -07:00
40b783b746 Fix flaky test due to numerical gradient approximation error.
Summary:
Use smaller step size for GradientChecks and pass seed to help reproducing the
test from logged inputs.

Reviewed By: Yangqing

Differential Revision: D5505698

fbshipit-source-id: fc308efe72d535695ba628944aee1913ba16b2f1
2017-07-26 18:58:19 -07:00
d187b2f4c9 MKLDNN bugfix
Summary: Some old compilers (e.g. gcc 4.8) does not like lambdas.

Reviewed By: ajtulloch

Differential Revision: D5500500

fbshipit-source-id: fe6bcc7277fd7e9607f54a83be1f0ec146411440
2017-07-26 18:58:18 -07:00
1bd7fd6bc8 fixed nnpack transform to match on cpu not gpu
Summary: [easy] fix convtonnpack transform to transform cpu operators not gpu

Reviewed By: bwasti

Differential Revision: D5501625

fbshipit-source-id: da69bd4127d29ccea707e91bff0573dc3a4b5e1b
2017-07-26 18:07:35 -07:00
925208af72 Implement BatchNorm double backwards (#2207)
* Implement BatchNorm double backwards as a python function called directly from C++.

This will be converted to C++ code once ATen is integrated with autograd.

* Some performance improvements via inplace ops and reusing calculations.
2017-07-27 06:00:31 +05:30
643f8d12ff [bugfix] in bce_with_logits logsumexp calculation (#2221)
* fix bug in bce_with_logits logsumexp calculation

* flake8 fix
2017-07-27 05:58:56 +05:30
d4bd6c4314 add some asserts to basic.cpp 2017-07-26 16:51:41 -07:00
9bec54bbf1 Modify arc cosine feature map and semi random layers to initialize parameters as global constants
Summary:
The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer.

I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished.

In this diff, I've:
- Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers
- Updated unit tests
- Ran an end-to-end test, enabling multiple readers to test the fixed issue

Reviewed By: chocjy

Differential Revision: D5483372

fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd
2017-07-26 16:37:00 -07:00
3b6d01301f add valgrind to CI 2017-07-26 16:11:26 -07:00
fb8f9de498 fix for ATen API Change 2017-07-26 18:55:56 -04:00
54b171eae5 Caffe2: don't swallow exception stacktrace
Summary:
Caffe2: don't swallow exception stacktrace

{F69325406}

Reviewed By: akyrola

Differential Revision: D5503227

fbshipit-source-id: 4e11d921652a094e20c46af19ba880390be8e997
2017-07-26 15:48:05 -07:00
cb9ad7a892 Opt into Trusty builds. (#2214)
* Opt into Trusty builds.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Bump to 2.7.9.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-27 04:04:57 +05:30
f7de7bab6e Merge commit 'fd97d92479e32e550866adfd1f0465e4cfa5e581' 2017-07-26 18:11:16 -04:00
fd97d92479 allow retain to be specified for unsafeTensorFromTH 2017-07-26 14:58:32 -07:00
f3aa97f169 Deduplicate THPUtils_checkLong/THPUtils_unpackLong (#2218)
There were two implementations of THPUtils_checkLong/THPUtils_unpackLong; one
that was a macro and one that was not, which is hella bad if you accidentally
include the macro before the real definition.  Now we always use the inline
function.

A reasonable follow-up task would be to un-macro-ify the rest of these functions.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-27 03:12:12 +05:30
b0648fc3fc Merge commit 'be9ef9283f297997afd3bf8e21147ec6bf09ebbf' 2017-07-26 17:25:39 -04:00
8f8dccd2ed distance_op_test from hypothesis_test refactored
Summary:
Moved distance_op_test from hypothesis_test to distance_op_test and
refactored

Reviewed By: akyrola, asaadaldien

Differential Revision: D5495104

fbshipit-source-id: 4a90c75eabeb380ae9d150d6258e9b5b0fbfc5ca
2017-07-26 13:37:08 -07:00
be9ef9283f Merge pull request #35 from ezyang/pr/undefined-dim-doc
Note [Undefined-dim versus 0-dim]
2017-07-26 12:42:33 -07:00
9c0d52a32f fix osx build errors related to long/int64_t 2017-07-26 12:36:25 -07:00
54545c2154 Note [Undefined-dim versus 0-dim]
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-26 12:34:13 -07:00
9ec7051442 Remove __func__ hack in auto nn. 2017-07-26 15:28:25 -04:00
e6ffc2acb1 Add get gradient for CTC
Summary:
- Adds GetCTCGradient CTC training, so we can use AddGradientOperators() on "costs". The function just calls CopyOp.
- Modified test to verify inputs_gradient is created in workspace.

Reviewed By: yqwangustc

Differential Revision: D5499271

fbshipit-source-id: 5a6985f90f309303aadaceb7c966d822ad3576b2
2017-07-26 12:09:15 -07:00
2676c6357f Enable Conv groups gradgradchecks. (#2216) 2017-07-27 00:24:12 +05:30
d89632b52c Support (U)INT8, (U)INT16 in data type conversion
Summary:
Data type conversion between Numpy Array and Caffe2 Tensor currently only support 3 types: FLOAT, DOUBLE and INT32. Support 8bit and 16bit date types will help reduce the model size in some circumstance. I benefit from this to reduce size of a data set from 8GB to 1GB by using INT8.
Closes https://github.com/caffe2/caffe2/pull/930

Reviewed By: Yangqing

Differential Revision: D5440929

Pulled By: akyrola

fbshipit-source-id: 3762da1d845e62a13ba384d1c144328b19dd663b
2017-07-26 11:23:53 -07:00
e6d2941ec0 revert codemod since this code also need to be built on ARM
Summary:
This is causing OSS android build failures such as

https://travis-ci.org/caffe2/caffe2/jobs/257609575

Reviewed By: akyrola

Differential Revision: D5497495

fbshipit-source-id: b3ba0cca135a4a632461851c9b9212f3d75abd5d
2017-07-26 07:57:42 -07:00
b6daf562c4 fix windows build
Summary:
__attribute__((unused)) is not supported on Windows, so we actually need to
substitute it with a macro.

Also changed UNUSED_VARIABLE to CAFFE2_UNUSED because we also use it to mark
functions now.

Reviewed By: ajtulloch

Differential Revision: D5497063

fbshipit-source-id: bcda026e626c41f71c21c36f029a3f871eaea7d4
2017-07-26 03:50:20 -07:00
d20e50a39f fix perfkernels build
Summary: Closes https://github.com/caffe2/caffe2/pull/967

Differential Revision: D5497418

Pulled By: Yangqing

fbshipit-source-id: 171d7a3128ef6e54d409a8186a4f335439a82f68
2017-07-26 00:32:22 -07:00
8cc9dbf357 Added Ninja generator support on Windows
Summary:
I successfully built caffe2 using MSVC 2015 and the Ninja Generator. I use vcpkg to build glfags, glog, lmdb and protobuf. Here is my build procedure:

1. Install vcpkg and set it up according to vcpkg docs
2. Install dependencies
```
$> vcpkg install gflags glog lmdb protobuf eigen3 --triplet x64-windows-static
```
3. Run CMake with this batch file
```Batch
setlocal
if NOT DEFINED VCPKG_DIR ( echo "Please defined VCPKG_DIR" && exit /b 1 )
if NOT DEFINED CMAKE_BUILD_TYPE set CMAKE_BUILD_TYPE=Release
if NOT DEFINED BUILD_DIR set BUILD_DIR=build_%CMAKE_BUILD_TYPE%
if NOT DEFINED USE_CUDA set USE_CUDA=OFF

call "%VS140COMNTOOLS%\..\..\VC\vcvarsall.bat" amd64

if NOT EXIST %BUILD_DIR% (mkdir %BUILD_DIR%)
pushd %BUILD_DIR%

set CMAKE_GENERATOR=Ninja
set ZLIB_LIBRARY=%VCPKG_DIR%\installed\x64-windows-static\lib\zlib.lib

cmake -G"%CMAKE_GENERATOR%" ^
      -DBUILD_SHARED_LIBS=OFF ^
      -DCMAKE_VERBOSE_MAKEFILE=1 ^
      -DBUILD_TEST=OFF ^
      -DBUILD_SHARED_LIBS=OFF ^
      -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^
      -DUSE_CUDA=%USE_CUDA% ^
      -DZLIB_LIBRARY:FILEPATH="%ZLIB_LIBRARY%" ^
      -DVCPKG_TARGET_TRIPLET=x64-windows-static ^
      -DVCPKG_APPLOCAL_DEPS:BOOL=OFF ^
      -DCMAKE_TOOLCHAIN_FILE:FILEPATH=%VCPKG_DIR%\scripts\buildsystems\vcpkg.cmake ^
      -DPROTOBUF_PROTOC_EXECUTABLE:FILEPATH=%VCPKG_DIR%\installed\x64-windows-static\tools\protoc.exe ^
      ..\

ninja
popd

endlocal
```
Closes https://github.com/caffe2/caffe2/pull/880

Differential Revision: D5497384

Pulled By: Yangqing

fbshipit-source-id: e0d81d3dbd3286ab925eddef0e6fbf99eb6375a5
2017-07-26 00:32:20 -07:00
cf1ce29631 Fix GPU SparseAdaGrad with empty tensors
Summary: CUDA doesn't like 0-sized grids :)

Reviewed By: Yangqing

Differential Revision: D5495805

fbshipit-source-id: 6819513024978ee6bb70a39b25d23ced06465750
2017-07-25 23:50:54 -07:00
0458985c1b Fix build with external nnpack installation
Summary:
libpthreadpool is needed during the linking stage and is missing when user chooses to use an external nnpack installation (from system libraries).

Fixes GitHub issue #459.

Detailed discussion on [this comment](https://github.com/caffe2/caffe2/issues/459#issuecomment-308831547).
Closes https://github.com/caffe2/caffe2/pull/808

Differential Revision: D5430318

Pulled By: Yangqing

fbshipit-source-id: 5e10332fb01e54d8360bb929c1a82b0eef580bbb
2017-07-25 23:03:39 -07:00
0ee5688892 Fix SparseLengthSum undeclared schema
Differential Revision: D5495760

fbshipit-source-id: a2c9e6204021687f6df830ea2bfbe355bfc888be
2017-07-25 18:19:10 -07:00
2f5c96a730 Fix Flatten operator for empty tensors
Reviewed By: xianjiec

Differential Revision: D5487475

fbshipit-source-id: f1321e15352b0bbe039312f544a9c2ed78da8732
2017-07-25 17:51:42 -07:00
240b307d8b Implemented Registry pattern + ConvToNNPack Transform as example of it
Summary: Implemented the registry pattern: now all transforms are instantiated by a string. I then made a simple transform which, given a graph, will change the engine of all Conv operators to be NNPACK, to demonstrate.

Reviewed By: bwasti

Differential Revision: D5447007

fbshipit-source-id: 48065a88fa648ad0e11f7f8ee93b8e732cd515d7
2017-07-25 15:08:02 -07:00
ef3b09fb5f fix a bug where some scalars were getting truncated to integers incorrectly. 2017-07-25 14:27:16 -07:00
133dc2603e Support grouped convolutions in MKL
Reviewed By: Yangqing

Differential Revision: D5487692

fbshipit-source-id: 94fb66b3b104cf16dcad07743def4ea940515689
2017-07-25 14:19:02 -07:00
d86f32ae2e Implement simple graph rewrite functionality.
Reviewed By: Yangqing

Differential Revision: D5487075

fbshipit-source-id: f7c7867c5cbae39cf197cf5e7ed8a64149f33208
2017-07-25 14:19:01 -07:00
1313f70390 MKLSpatialBNOp doesn't support in-place
Reviewed By: Yangqing

Differential Revision: D5487067

fbshipit-source-id: cd0068d9dbed8d55c4c2ed913a80b97113c49653
2017-07-25 14:19:01 -07:00
9e6ea2987f MKLReluOp supports in-place X/Y
Reviewed By: Yangqing

Differential Revision: D5487060

fbshipit-source-id: 35d2d450f46aefc3c9395be45af99e13d1c168ec
2017-07-25 14:19:00 -07:00
af43e2b251 MKLConvOp handles the no bias case
Reviewed By: Yangqing

Differential Revision: D5487050

fbshipit-source-id: 4791943d331a2d7283f0f9b939f3f03e32dbdbed
2017-07-25 14:18:58 -07:00
f028e74fb7 Implement a filler op test
Reviewed By: Yangqing

Differential Revision: D5487042

fbshipit-source-id: 0b03683fd3822769381c14790c0c2e46162d1aaf
2017-07-25 14:18:57 -07:00
ea813c0a91 Fix MKLFallbackOp random seed propagation.
Reviewed By: Yangqing

Differential Revision: D5487038

fbshipit-source-id: 5ec3958a4c73611ff6ce784a4336aeacce95575b
2017-07-25 14:18:56 -07:00
bbf2b578dc Implement MKL CopyTo/CopyFrom ops
Reviewed By: Yangqing

Differential Revision: D5482636

fbshipit-source-id: d044c495837aef985210f0b63d61f88f9acc3db7
2017-07-25 14:18:55 -07:00
71d04fd5cc Implement SumOp for MKL
Reviewed By: Yangqing

Differential Revision: D5482622

fbshipit-source-id: e1e8f8aebce874efc31fab2c870cd274ca0d037c
2017-07-25 14:18:54 -07:00
ad7d7657a4 Add tests to targets
Reviewed By: Yangqing

Differential Revision: D5482614

fbshipit-source-id: 04727e19b7b83b6d0d41ad3227866957480bc1ee
2017-07-25 14:18:54 -07:00
007492e730 Fix MKL spatial pooling test
Summary: tsia

Reviewed By: Yangqing

Differential Revision: D5482603

fbshipit-source-id: e95a8829c71125623066cfee3b76e774c7f3a46b
2017-07-25 14:18:53 -07:00
a7d8f489d9 Improe MKL SpatialBN test
Summary: tsia

Reviewed By: Yangqing

Differential Revision: D5482596

fbshipit-source-id: 2817ceb57154dcefffec3251efc397cba8163097
2017-07-25 14:18:52 -07:00
8b2c6341cc Fallback for MSRAFill
Summary: TSIA

Reviewed By: Yangqing

Differential Revision: D5482575

fbshipit-source-id: 57bcf4b980c42ca4200e8a2fab50fe5152f67501
2017-07-25 14:18:52 -07:00
f194ac1e09 Merge pull request #477 from wickedfoo/feature_lp_pooling
GPU implementation of L_p feature pooling
2017-07-26 02:31:59 +05:30
26a0b9aa43 Merge pull request #1259 from wickedfoo/feature_lp_pooling
CPU implementation of L_p feature pooling
2017-07-26 02:31:50 +05:30
f656e002a7 CosineSimilarity GPU
Reviewed By: asaadaldien, akyrola

Differential Revision: D5476812

fbshipit-source-id: d931a7d8e4a4dfdf22ee18f8b9c755cc21b0e75b
2017-07-25 13:34:01 -07:00
e548580f31 Add missing models to torch vision documentation (#2204) 2017-07-26 01:58:18 +05:30
421607a935 DataParallel device_ids slicing fixes (#2200) 2017-07-26 01:54:38 +05:30
eccddbc204 vectorized typed axpy implementation
Summary:
This adds an example for vectorized typed axpy implementation under
perfkernels.

Reviewed By: dzhulgakov

Differential Revision: D5479258

fbshipit-source-id: 469e6c8aaf2c12cdf0025bc867eb9d4cab84184f
2017-07-25 12:08:27 -07:00
c2f2b5ad51 lengths_reducer_ops refactoring.
Summary:
(1) Wrote up length reducer operators from the original dispatcher
implementation under segment_reduction_op.cc. Note that this does not
change the fp16 version now.

(2) created subfolder perfkernels for potential different backends, with
scaffolding done.
(3) provided the vanilla fp16 implementation, so that currently the default
implementation will support fp16 (very slow) right now. This sets up the
fp16 benchmarking capability after D5477844.

Next step is actually to implement the faster versions. The goal of this diff
is mainly so that Misha can plug in his custom implementations more easily.

Reviewed By: dzhulgakov

Differential Revision: D5479056

fbshipit-source-id: bba30dc0d892b8e2cdfc825034fdfb7bd22a1726
2017-07-25 12:08:26 -07:00
0d96933338 Fix assert
Summary: If the last group has length=0, then ##start == end == len_indices##. Implementation is correct, just the assert is not

Reviewed By: wickedfoo

Differential Revision: D5488858

fbshipit-source-id: fcc4ef8162f1390534a7c556de2ae7d2b82eddc9
2017-07-25 10:38:02 -07:00
7be545292d Update cudnn.py 2017-07-25 09:35:44 -04:00
a0e83280ef Update cudnn.py 2017-07-25 09:35:44 -04:00
aa35be2032 search for cudnn in conda 2017-07-25 09:35:44 -04:00
626840aef3 C function wrapper uniqueness (#1912)
* add SharedFunctionMaker to create Function shared in the graph

* Clean shared_ptr usage for only function that will be used in the graph

* make Function binding match Varible one

* remove unnecessary changes

* fix comments

* proper weakref implementation

* add call to clear in dealloc
2017-07-25 13:12:54 +05:30
babb28d2a3 Change DHCECK to CAFFE_ENFORCE in softmax_with_loss_op.cc
Summary:
Based on discussion on the post in Caffe2 users. Changing DCHECK that works only in debug mode to CAFFE_ENFORCE that throws exception and is a better option.

Update: Also correct the check for label_data >= 0, did not check for all elements previously. Moved it to inner loop.

Reviewed By: akyrola

Differential Revision: D5483788

fbshipit-source-id: ccbff09e19e05e7036db772498f71795063c1fed
2017-07-24 21:52:30 -07:00
5449afa855 use model.create_param instead of using param_init_net directly
Summary: When creating parameters for modelhelper, we should use create_param instead of using param_init_net and model.params directly. The diff rewrite some of these cases in rnn_cell.py in order to make model._parameter_info and model.params consistent.

Reviewed By: kittipatv

Differential Revision: D5477724

fbshipit-source-id: 28c4aaf8f98d9d89125af6a42ad328008f0079e1
2017-07-24 21:17:24 -07:00
eae6400d59 Updated summary for the FC layer in caffe2
Summary: Fixed incorrect description of the input tensor X, and auto-formatted the file.

Reviewed By: jamesr66a

Differential Revision: D5467876

fbshipit-source-id: 1936cf5eb65824c8aeaf2c7924d5b850ab36b593
2017-07-24 20:32:29 -07:00
bcea678e7b Update rebased functions to call apply. 2017-07-25 07:37:25 +05:30
1a52ca02ef Always return indices from MaxPool autograd functions to simplify implementation;
The callers (in functional.py) will filter out the return instead.
2017-07-25 07:37:25 +05:30
84314859af Implement double backwards for MaxPool2d. 2017-07-25 07:37:25 +05:30
9c2beb33c5 Implement double backwards for MaxPool1d. 2017-07-25 07:37:25 +05:30
7deba74969 Implement MaxPool{1d,2d,3d}Backwards (non-differentiable) functions. 2017-07-25 07:37:25 +05:30
48bb07a4db Implement double backwards for AvgPool3d. 2017-07-25 07:37:25 +05:30
bb86ed7b97 Implement double backward for AvgPool1d, AvgPool2d, LPPool2d. 2017-07-25 07:37:25 +05:30
291369ff1b Convert pooling functions to new-style, once_differentiable functions. 2017-07-25 07:37:25 +05:30
2118400e18 Fix lint. 2017-07-25 07:37:25 +05:30
39934da8b3 Address review comments. 2017-07-25 07:37:25 +05:30
c12b494329 Implement double backwards for ELU. 2017-07-25 07:37:25 +05:30
506d52dc33 Add check_gradgrad=False for new NLLLoss2d test. 2017-07-25 07:37:25 +05:30
7687c2677a Fix double backwards advanced indexing derivative wrt grad_output.
Also small legacy nn test issue and unrelated syntax issue.
2017-07-25 07:37:25 +05:30
97d21e243b Implement L1Cost double backwards. 2017-07-25 07:37:25 +05:30
0bda56956e Implement double backwards for auto-generated HardTanh. 2017-07-25 07:37:25 +05:30
40af93bb57 Optimize PReLU double backwards via a PReLUBackwards autograd function. 2017-07-25 07:37:25 +05:30
9608e37969 Implement double backwards for PReLU. 2017-07-25 07:37:25 +05:30
ec7c510557 Implement Softsign double backwards. 2017-07-25 07:37:25 +05:30
8636be3880 Ensure gradients wrt grad_outputs are checked in gradgradcheck. 2017-07-25 07:37:25 +05:30
fb2284f3a0 Add gradgrad checks for NN module and criterion tests. 2017-07-25 07:37:25 +05:30
9ec9dee27d Implement NN Criterion functions as potentially double backwards functions. 2017-07-25 07:37:25 +05:30
7b6aab9079 Unify implementation of _Loss and _WeightedLoss autograd functions. 2017-07-25 07:37:25 +05:30
852dd5f011 Convert _WeightedLoss functions to new style autograd functions. 2017-07-25 07:37:25 +05:30
085abee444 Rebase kl_div changes. 2017-07-25 07:37:25 +05:30
48b85fe012 Implement THNN non-criterion Functions as new style with backward/backward. 2017-07-25 07:37:25 +05:30
45ce4df74c Convert auto nn Functions (non-criterion) to new style. 2017-07-25 07:37:25 +05:30
5695cbf986 Add comments in loss.py and distance.py (#2189)
* Add examples in CrossEntropyLoss

1. Added examples in CrossEntropyLoss
2. Make consistent style of example for PyTorch docs
3. Delete unnecessary character '

* Change comments in distance.py

1. Delete x1, x2 from arguments and add eps in PariwiseDistance
2. For the shape, added input1 and input2 for readability (PairwiseDistance and CosineSimilarity.

* Add examples

Added the word 'examples' for PyTorch docs
2017-07-25 07:36:28 +05:30
03df5debe3 Gloo fixes for Linux + old cmake (2.8.0) + old glibc (CentOS6) 2017-07-24 21:59:58 -04:00
2ebdef0154 Add 'torch/lib/gloo/' from commit '1978bba3e421eceab6181bcbc838553091cedecc'
git-subtree-dir: torch/lib/gloo
git-subtree-mainline: ceb4f84d12304d03a6a46693e54390869c0c208e
git-subtree-split: 1978bba3e421eceab6181bcbc838553091cedecc
2017-07-24 21:59:49 -04:00
8930c095c1 Add support for int32 indices in SparseLengthSum and friends
Summary:
Need it for some reference comparison for c2isl.

Also there's an argument that it might be faster on GPU with int32. Doesn't seem to be the case now, but haven't tested with Jeff's changes yet.

Reviewed By: kennyhorror

Differential Revision: D5405482

fbshipit-source-id: dc1a983dce5f06f1111c5634ec475647c94848cc
2017-07-24 17:50:00 -07:00
ceb4f84d12 Improve memory usage of cuDNN RNN modules (#2179) 2017-07-25 04:00:17 +05:30
10667a914e Add linter for enforcing caffe operator documentation
Summary: Add check that every time we register a caffe operator to CPU or GPU that documentation is added for the particular operator.

Reviewed By: dzhulgakov

Differential Revision: D5443110

fbshipit-source-id: 3793c3d29bea1228078cb30bdf8243ac0ab90664
2017-07-24 15:27:47 -07:00
112728cbe9 reformulate bce_with_logits to not use abs (#2195)
* reformulate bce_with_logits to not use abs

* flake8 fixes
2017-07-25 03:46:27 +05:30
dc17fb68e4 Fix minor bug in parallel_apply (#2193) 2017-07-25 03:45:00 +05:30
0eda7955bd use internal cell for DropoutCell output prep methods
Summary:
In order to get dimensions right, correctly identify gradients, etc., DropoutCell should call the _prepare_output and _prepare_output_sequence methods of its internal cell for its own such methods.

This bug was identified by NVIDIA intern Syed Tousif Ahmed.

Reviewed By: akyrola

Differential Revision: D5483082

fbshipit-source-id: f6df5b4a0502ed0771056638aab219fb5cc7d964
2017-07-24 14:53:11 -07:00
0deee2194f Add a quick SparseLengthsSum benchmark.
Summary: TSIA - this makes it a bit easy to benchmark sparse lengths sum.

Reviewed By: dzhulgakov

Differential Revision: D5477844

fbshipit-source-id: 89e25c5e0dbf3538877ba1a9abc75a10abfa2757
2017-07-24 13:17:47 -07:00
4195858614 factored out DBExists function
Summary: DBExists function was factored out of the DBExistsOp.

Reviewed By: azzolini

Differential Revision: D5472587

fbshipit-source-id: 2a53375ffcccfb88e8f0af2ab55dad4c6a9586e3
2017-07-24 11:21:27 -07:00
7b2b817b9c improve error for non-existing/vs. sparse or dense gradient
Summary: I have hated the "gradient of X is either not provided or sparse" message. It is better to say which one is the problem.

Reviewed By: dzhulgakov

Differential Revision: D5468923

fbshipit-source-id: b63cde293fe252e5136d225ce4c762b4981f6fc8
2017-07-24 08:56:02 -07:00
9c3e59d484 updated research proposal link
Summary: Closes https://github.com/caffe2/caffe2/pull/957

Differential Revision: D5480191

Pulled By: aaronmarkham

fbshipit-source-id: 445a4955795a2b16d53238029e9140533f7888e5
2017-07-24 08:56:02 -07:00
f6afa6adbd Add proper cpuid support.
Summary:
This is needed for us to do more fine grained dispatch based on CPU arch, so
I figured we should just add it. Can help Dima and Misha doing optimization
I think?

Reviewed By: dzhulgakov

Differential Revision: D5477444

fbshipit-source-id: 48aaf8bd799e9755493cd51c793ceec080a8846c
2017-07-23 17:21:50 -07:00
3c1c3c10e7 Apply OperatorDef shared pointer memory saving feature to DAG nets
Summary: SimpleNet and DAGNetBase are the only two direct subclasses of NetBase. This feature has already been applied to SimpleNet before, with this diff all nets should be covered.

Reviewed By: dzhulgakov

Differential Revision: D5475498

fbshipit-source-id: 339edac31d008ec1e4630d93d2e27d0f518f4ebb
2017-07-23 16:21:58 -07:00
99e79a616b attention with encoder_lengths
Summary:
For RNN attention, we should not include the invalid parts of the encoder output (based on encoder_lengths) in the computation. This diff accomplishes that by forcing logits for those positions to be negative infinity.

Note that the this step can be bypassed by passing encoder_lengths=None, which is what we do for beam search, thus incurring no extra overhead for inference.

Reviewed By: jamesr66a

Differential Revision: D5402547

fbshipit-source-id: 1863d6050b5129e4df829c6357f0aa9ded0715dc
2017-07-23 10:06:01 -07:00
4a4d8841e6 Delete unused import 2017-07-23 12:48:11 -04:00
b51e0ec0c2 quick fix inplace blob bug
Summary: fixing the case where the init net will initialize same blob twice. I made an exception by allowing inplace blob among ops if the blob keeps on the same device. This should fix this problem in a generalized way as most of our training is only on CPU now.

Reviewed By: dzhulgakov

Differential Revision: D5450564

fbshipit-source-id: 525c4c9a2e5216a70dbd1229da2d9f8a58b89e47
2017-07-23 02:18:16 -07:00
920c553ac0 saving/loading CPU/GPU nets
Summary: Saving 2 nets at offline training and loading the correct net the user want. The keep_device=false will help us load gpu blobs to CPU memory.

Reviewed By: dzhulgakov

Differential Revision: D5396689

fbshipit-source-id: ff26bf3759856b07f3a1bbefac4a1e613a8a02e1
2017-07-23 02:18:15 -07:00
4a256dfc97 save/load/run nets and params with device info correctly
Summary:
===Update log 7/10===

We are now restrained from problem of connection. Will post if this problem does not fix in 2hrs.

===Update 7/6===

Luke is experimenting on the convergence of this diff. Hopefully he could present results next week

Right now this is not affecting our original CPU training pipeline because the loading op is still correct in CPU situation now.

I will need final test to make sure. But that is now blocked by log device issue t19952135

I will do CPU/GPU nets saved in a separate diff.

====Update before 7.4====
It's actually working! Include local run screenshot
{F67959016}

dogscience

Reviewed By: dzhulgakov

Differential Revision: D5307058

fbshipit-source-id: cad5d9324c239419530f4b120392ec2ccbb72280
2017-07-23 02:18:15 -07:00
3c275fe7a0 Increase flaky test tolerance (#2185) 2017-07-22 11:37:34 -04:00
6892b03499 bindings
Reviewed By: Yangqing

Differential Revision: D5458167

fbshipit-source-id: 74b52df567e4b44977685c5b396795a0ff056682
2017-07-21 19:03:43 -07:00
58039aa25b Improve PoolOp NCHW
Reviewed By: asaadaldien

Differential Revision: D5459633

fbshipit-source-id: cd09c1a6cfaab76e04baeed289b002c9f12bb80d
2017-07-21 18:22:06 -07:00
24ece087c7 Replace ReduceDimsOps math::Gemv with CUDA reduction kernel. 5.6x speed up.
Summary: This reduces runtime from 1.54757 ms/iter -> 0.273687 ms/iter for an 100 parallel reductions each of size 100000.

Reviewed By: akyrola

Differential Revision: D5471324

fbshipit-source-id: 626cabb8249fb4655275648fae2738cb739e1a72
2017-07-21 17:36:22 -07:00
804ebf7c41 Populate learning rate blob name into data_parallel_model and fix resnet50_trainer example.
Reviewed By: akyrola

Differential Revision: D5463772

fbshipit-source-id: 10b8963af778503a3de6edbabb869747bd1e986d
2017-07-21 16:24:10 -07:00
34be12353b comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 15:14:43 -07:00
1978bba3e4 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:12 -07:00
b16c911667 Implementation for Graph Transforms
Summary: The Implementation of Graph Transformations, with the PatternMatch and ReplaceMatch rules.

Reviewed By: akyrola

Differential Revision: D5404144

fbshipit-source-id: 2bab68e6bff2e841ea9fb64df5d92ea945e704af
2017-07-21 13:51:12 -07:00
8e80ef7e6d s/CopyGPUToGPU/Copy
Summary: CopyGPUToGPU does not exist. Copy seems to do the trick. Didn't go into details of how copy works, not sure if it ends up triggering UVA.

Reviewed By: akyrola

Differential Revision: D5471014

fbshipit-source-id: d8bc1aed9b19070c92f3ffc76f5617bdd0054563
2017-07-21 13:51:11 -07:00
35757af6f7 Add broadcasting of weights to bce/bce_with_logits (#2161)
* added tests + removed explicit expand of weight in bce with logits

* add auto broadcasting of weight to BCELoss

* remove the need for _BCELoss

* formatting of warning

* remove TODO

* move across assert from _functions/thnn/loss.py

* flake8 fixes
2017-07-21 16:02:07 -04:00
8ab3d214d5 Fixes for DistributedDataParallel (#2168) 2017-07-21 16:00:46 -04:00
ec2def803b Merge commit '2efac3ed83a29f57f914e9044fdddd2ce7ecd6b7' 2017-07-21 15:58:23 -04:00
71ce3448d9 Fix torch.inverse when magma is not available
Fixes #2156
2017-07-21 15:57:43 -04:00
2efac3ed83 Fix torch.inverse when magma is not available
Fixes #2156
2017-07-21 15:57:25 -04:00
66bbe5d75a .creator -> .grad_fn in the code example (#2171) 2017-07-21 14:43:16 -04:00
efe2d01a3e Fix some bugs in CPU version of BooleanMask and add GPU version
Reviewed By: akyrola

Differential Revision: D5397208

fbshipit-source-id: 0314cc181e315f3b6cda846292b2e2ea73bb015b
2017-07-21 11:38:49 -07:00
ea607afd06 Add comments in nn.Upsample (#2175) 2017-07-21 14:34:58 -04:00
d94c68ecff Remove net_def_ from NetBase
Summary: Constructor should extract everything needed from NetDef instead of keeping it for usage after construction.

Reviewed By: akyrola

Differential Revision: D5469095

fbshipit-source-id: 288ea3243d85061ba9c018d2aef3b4d97485dd00
2017-07-21 11:22:34 -07:00
4f035f14de Add a support matrix for distributed backends 2017-07-21 14:19:46 -04:00
72e9e7abf7 Warning squash.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-21 14:13:11 -04:00
7f28a891f3 added sincos function to caffe2/utils/math
Summary: In situations where both sin & cos are necessary to compute, the joint SinCos function is faster than doing these individually. Both MKL and CUDA support this function, so exposing it here.

Reviewed By: kmatzen

Differential Revision: D5465588

fbshipit-source-id: 7686498e4f2d4b5862d83a1ecf14fcc88ea53640
2017-07-21 09:55:21 -07:00
4d45ce7d11 Added UpSampling module and associated tests. 2017-07-21 12:25:50 +01:00
cbb85545ec warn about orphan StopGradient output
Summary: Quite common confusion is how to use StopGradient, and typical bug is to forget to specify input=output. This adds a sanity check to gradient builder that checks if some StopGradient outputs are orphaned.

Reviewed By: dzhulgakov

Differential Revision: D5458341

fbshipit-source-id: 056fef4f0ee53eb10e66e9be0ecb55b55f9cc3d7
2017-07-20 21:41:41 -07:00
bcce1bd04a Fix optimizer_context OSS test
Summary:
This will fix the test by querying how many instances of the optimizer are already created.
Because OSS tests doesn't run in isolation causing number of created instances of optimizer to be >= 0.

Reviewed By: akyrola

Differential Revision:
D5462433

Tags: easy

fbshipit-source-id: 7a9ab4fe5345f5d5138abb461ba7a990d9ace840
2017-07-20 12:21:09 -07:00
290acab2c7 implement drelu and unittest
Summary:
In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details.
To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px.

To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI.

I also add units test for DRelu. We check the shape of output and also do the numeric unit tests.
For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer.

Reviewed By: chocjy

Differential Revision: D5341464

fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630
2017-07-20 11:50:08 -07:00
eed323c344 avoid warning 2017-07-20 10:59:56 -07:00
ea6f9a26b8 fix version number 2017-07-20 13:30:53 -04:00
3719b4247a return a sentinel value when THTensor has undefined dimensions. 2017-07-20 10:25:30 -07:00
bf1fc250d1 get conda root dir automatically, trick from Dockerfile 2017-07-20 11:02:30 -04:00
47942307b5 Comment that data of THStorage may be NULL.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-20 10:55:35 -04:00
6b69723d4f Document how Numpy memory management works.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-20 10:55:35 -04:00
5254846bb2 fix typo of error msg of cmul in THSTensorMath (#2158) 2017-07-20 02:58:54 -04:00
f3f478960e Convert Embedding to new style. (#1916)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-20 02:35:21 -04:00
e537023147 add functional embedding (#1987) 2017-07-20 01:53:37 -04:00
09abaa2189 make keepdim backcompat warnings emit in autograd as well (#2157) 2017-07-20 01:48:05 -04:00
575a4a98e0 Remove assertions with side effects 2017-07-20 01:45:57 -04:00
02e23f4f6b Unify argument names in tensor and Variable methods 2017-07-20 01:45:57 -04:00
8946502348 Accept all kinds of arguments in Variable.expand 2017-07-20 01:45:57 -04:00
e708de37cc Allow keyword args in long_arg options 2017-07-20 01:45:57 -04:00
4af40e3471 Let parallel_apply accept arbitrary inputs 2017-07-20 01:45:57 -04:00
f417cb062b Fix repeat backward to handle unsqueezed dims 2017-07-20 01:45:57 -04:00
29cb541ec6 Fix CMakelists.txt for caffe2/contrib
Differential Revision: D5460222

fbshipit-source-id: 765c94e48f9831b176244060c3e126097f5bb924
2017-07-19 21:35:34 -07:00
8d35b11af2 Graphs for Graph Transforms
Summary: The Graph Interface and Implementation, for the Graph Transformation Framework. The last diff was too long and unapproachable - let's try this instead :)

Reviewed By: akyrola

Differential Revision: D5403985

fbshipit-source-id: 89f9361841088db8ebf45a9a4f8d2357eae3fb76
2017-07-19 14:06:03 -07:00
44790697c7 Nuke arg_helper() in OperatorBase
Reviewed By: akyrola

Differential Revision: D5449624

fbshipit-source-id: 20ff6568fe3482af94d1d266e9b47a1709b5004e
2017-07-19 13:52:39 -07:00
11f3ccf98f Add missing Modules to nn.functional (#1801)
* add dropout2d and dropout3d to functional

added some loss functions to functional

added tests

using dropout from backend

added docs

fixes

* edited loss modules to call functional
2017-07-19 15:55:21 -04:00
31894cafdd add support for advanced indexing with less than ndim indexers, ellipsis (#2144) 2017-07-19 15:51:03 -04:00
95ccbf8b0b better error message in load_state_dict when there are inconsistent tensor sizes (#2151) 2017-07-19 15:50:29 -04:00
a5422d14c8 Merge commit 'bd6263c338c717de880cddfed660b5aa06ee108b' 2017-07-19 15:48:54 -04:00
82143487b3 Add CUDA support for arange
Also enables CUDA for range
2017-07-19 15:48:20 -04:00
bd6263c338 Add CUDA support for arange
Also enables CUDA for range
2017-07-19 15:43:00 -04:00
f4a565ded9 Merge commit '1c6a08c1c2a50a7048ae9e6e11290740d24a8374' 2017-07-19 15:42:20 -04:00
1c6a08c1c2 fix lint 2017-07-19 12:41:17 -07:00
a5c2546c0f version bump 2017-07-19 12:34:43 -07:00
65e675e3e1 Fix net construct bench
Summary: Net construct bench was using old version of data_parallel_model API.

Reviewed By: bddppq

Differential Revision:
D5453281

Tags: easy

fbshipit-source-id: 93e1ba58511c7b25235ee50d9862fd0614b344c9
2017-07-19 11:23:39 -07:00
13e84e460b Use unaligned store intrinsic to enable vectorized reductions on unaligned buffers
Summary: When performing reductions on fp16 buffers, gloo assumed that both buffers were either aligned to 32 bytes or misaligned by the same offset. This may not hold in intermediate steps of halving-doubling allreduce, when the reduction is performed on some offset within the receive buffer. The fix is to use intrinsic instructions that work with unaligned pointers.

Reviewed By: akyrola

Differential Revision: D5450103

fbshipit-source-id: 9a1c8f8c34d2e62223f6d5c21573ea1cfad6537f
2017-07-19 11:06:32 -07:00
4d5d9de541 Merge commit '768b7c0dee34b614ab1cd8f89c69ec7d86c19c88' 2017-07-19 12:22:36 -04:00
9da882e396 Merge commit 'ae3a8d5d2eaa1b15d825b86ce706b046e68733b8' 2017-07-19 12:21:52 -04:00
15bece50d1 Merge commit 'cfcf2af95f91a88ec61cbcac8b30a718e7332aa5' 2017-07-19 12:20:54 -04:00
8144f7c95d Merge commit '58334a0c4b3c386931293f7fbee3d2cf066221a5' 2017-07-19 12:20:20 -04:00
b660303a16 Static linking against libstdc++ in Binary Build mode 2017-07-19 12:19:36 -04:00
768b7c0dee Static linking against libstdc++ in Binary Build mode 2017-07-19 11:23:31 -04:00
ae3a8d5d2e Static linking against libstdc++ in Binary Build mode 2017-07-19 11:23:21 -04:00
58334a0c4b static MKL detection and linkage fixes 2017-07-19 11:22:46 -04:00
cfcf2af95f add explicit BLAS linkage to THC when linked against magma (in binary build) 2017-07-19 11:22:23 -04:00
f3df24269d Merge commit '975550512200cfa1ae18e21400e7efa3924a3d46' 2017-07-19 11:05:51 -04:00
c4120f34bf move to model with cuda indexing tensors for cuda tensor adv indexing 2017-07-19 11:05:10 -04:00
9755505122 move to model with cuda indexing tensors for cuda tensor adv indexing 2017-07-19 11:04:49 -04:00
8b42308f71 Bug in line 381 (sparse) (#2130)
The function iterates over columns and sets "sparsity" fraction of entires in each column to 0. The number of zeros in a column (num_zeros) is then ceil(rows*sparsity)
2017-07-18 22:55:06 -04:00
685ae4813e Squash "macro expansion producing 'defined' has undefined behavior" warnings.
Fixes #2141.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-18 22:24:55 -04:00
c1384ef99e Fix NDPooling gradient non-symmetric padding check.
Reviewed By: dutran

Differential Revision: D5436817

fbshipit-source-id: 7fc325589bcd92b7964067493f3342430476126b
2017-07-18 19:08:49 -07:00
9b9df3fbeb Sync mobile codebase changes back to fbcode
Summary: Rather chunky sync of changes made exclusively to mobile codebases back to fbcode.

Reviewed By: ajtulloch

Differential Revision: D5314405

fbshipit-source-id: c4d0a7244468f953eb63288306bc9bc78eb9e1be
2017-07-18 17:54:41 -07:00
a0fef9dd22 Merge commit '703429d49eb397102ba20e6d4c0dd7714be001a5' 2017-07-18 20:17:26 -04:00
703429d49e Make clang shut up about class/struct mismatch.
Makes us -Werror clean again, I think.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-18 20:16:20 -04:00
4a81b0f24a make SparseLookup support None pooling
Summary: Adding pooling option as None, and SparseLookup will gather the embedding for each id.

Reviewed By: kittipatv

Differential Revision: D5421667

fbshipit-source-id: 1e8e2b550893ff3869dab12f8eb1fe24a063c3d5
2017-07-18 16:39:55 -07:00
11c4647447 Allow CPU device scope in data_parallel_model and data_parallel_rendevous device scope checks
Summary: Allowing CPU device scope instead of enforcing no device scope in data_parallel_model and data_parallel_rendevous.

Reviewed By: akyrola

Differential Revision: D5440492

fbshipit-source-id: bcd4344d64c710ea50ec8a65e3e9d102e35c66ea
2017-07-18 15:47:41 -07:00
567d95fa09 Merge pull request #25 from killeent/nullable-tensors
add support for Null Tensors to functions
2017-07-18 17:35:02 -04:00
7914d67ce3 Merge pull request #20 from killeent/type-equality
operator== for type
2017-07-18 14:32:45 -07:00
8451468d8b still generate multiple versions 2017-07-18 14:31:35 -07:00
3cc03568da Fixing error message for layer model helper
Summary: - Minor fix for error message in layer model helper file

Reviewed By: chocjy

Differential Revision: D5440768

fbshipit-source-id: df47bfe68a0caa750f0d3c8def28a5585e465ee0
2017-07-18 09:52:45 -07:00
138b216686 add support for Null Tensors to functions 2017-07-18 07:51:51 -07:00
4e019dbb6f Rename def() to debug_def()
Summary: Also eliminated non-debug ueses of debug_def

Reviewed By: akyrola

Differential Revision: D5441534

fbshipit-source-id: 9dab5fb74e25b4da504fa893ec1f3478e282d3f3
2017-07-17 23:50:01 -07:00
5881aa0a78 Use shared_ptr to share OperatorDef across threads
Reviewed By: akyrola

Differential Revision: D5434291

fbshipit-source-id: 89f470d1e2dcde36c3273d86565b1952d7682808
2017-07-17 23:49:59 -07:00
e5a7891038 dot product using matmul
Summary:
1. PairwiseDotProduct in layers
2. add_axis argument in Concat and Split(just for backward propagtion)

Reviewed By: xianjiec

Differential Revision: D5383208

fbshipit-source-id: 8e18ce371fff2da2da77b1a728142d69cd48e9c3
2017-07-17 23:20:37 -07:00
6f6d70ffed Merge commit 'dc5854477951765f5edbac34b0c228449de1b56b' 2017-07-18 01:34:54 -04:00
dc58544779 fix baddbmm for expanded tensors 2017-07-18 01:33:59 -04:00
427cc68ba2 added TensorInferenceFunction for ExpandDims operator; deleted Reshape layer.
Summary: The diff added TensorInferenceFunction for ExpandDims operator, so that ExpandDims layer is no longer needed (it can be handled by functional layer)

Reviewed By: kittipatv

Differential Revision: D5430889

fbshipit-source-id: 4f895f2751663c45db4cc4f87e5114c63cda9fbb
2017-07-17 21:03:00 -07:00
8a6d348bb8 Improve QPS metric. Better reporting to the UI.
Summary: As desc.

Differential Revision: D5440067

fbshipit-source-id: e4c08d650f1ae9008b1e910e136ba973cc5e0d49
2017-07-17 21:02:59 -07:00
e13704c467 fix shadowed variable name
Summary: When compiled with -Werror=shadow-compatible-local, cannot reuse a variable name. This passed our tests, but some people use stronger settings to compile.

Differential Revision: D5440805

fbshipit-source-id: a246af748717fb7e0e7a321e1ac4ddfef68ae524
2017-07-17 19:10:30 -07:00
cddb73899c fix strip prefix bug in SaveOp
Summary:
if strip_prefix_ not found in blob name, strip_prefix_.size() characters of blob name will be stripped.
Closes https://github.com/caffe2/caffe2/pull/924

Differential Revision: D5440941

Pulled By: akyrola

fbshipit-source-id: 1db772fac4c74f2ce05105eec4bc7742a9067ebc
2017-07-17 19:08:23 -07:00
95291f0f74 Revert D5348078: Add linter for enforcing caffe operator documentation
Summary: This reverts commit c3fa22fc7ca8066d5fc8fa780b23d7867fd3380e

Differential Revision: D5348078

fbshipit-source-id: f536e647cbd221b26ccbc105a5f5f8bdbcc119ab
2017-07-17 18:36:38 -07:00
b6722a07cd remove compilation warning on segment_reduction_ops.cu
Summary: Remove this compilation warning: P57645594. Been there a while.

Reviewed By: harouwu

Differential Revision: D5436753

fbshipit-source-id: 630be22f097fdcae7fe0372eed49f20c065146ba
2017-07-17 17:46:21 -07:00
e9dd8e0e3b Use one key for all pairs per node
Summary: To reduce round trips with store handlers, it is better to store all addresses in one key instead of one address per pair. This is what this implements.

Reviewed By: andrewwdye

Differential Revision: D5435893

fbshipit-source-id: 2d3ea3a2822c3b934ff2578d44a262e7bfbde6d0
2017-07-17 17:35:19 -07:00
78c4c4f885 handle RecurrentNetwork operator when clone net
Summary: added support of passing remap_funcs to clone_and_bind_net, so that it can pass it to clone method. Added other utils to ensure RecurrentNetwork operator is correctly cloned based on the remap_blob. The reason that RecurrentNetwork operator needs special treatment is that its arguments contain proto and blobs.

Reviewed By: kittipatv

Differential Revision: D5421532

fbshipit-source-id: 5de68365ce97df2de483f02ad260d78c8d35eead
2017-07-17 17:33:21 -07:00
981c84f7b2 remove unused parameters in math_cpu.cc
Summary:
This removes/comments out/silences one or more unused parameters in the files.
We are going to enable `-Wunused-parameter` in fbcode and this fixes a case that automated tooling can't handle.

This diff is automatically generated.
Reviewers are added heuristically.

Reviewed By: dzhulgakov

Differential Revision: D5436791

fbshipit-source-id: 164b080c1bc0f6aad146087ddeded255fe9a3d22
2017-07-17 16:09:35 -07:00
f7a92145d4 comment out unused parameter in pybind_state.cc
Summary:
This removes/comments out/silences one or more unused parameters in the files.
We are going to enable `-Wunused-parameter` in fbcode and this fixes a case that automated tooling can't handle.

This diff is automatically generated.
Reviewers are added heuristically.

Reviewed By: dzhulgakov

Differential Revision: D5437217

fbshipit-source-id: c2fc5ed30e7ee47b8c40248f89a9f4304ce7c098
2017-07-17 15:57:49 -07:00
0d833590c1 Change Allocator interface to return deleter
Summary:
This is in preparation for adding huge pages. There we want to remember for the pointer how we got it - via mmap() or alloc(). One option is to store gigantic map of void* -> destructor, but luckily usages of Context::New are all inside Tensor which already uses shared_ptr with custom deleter.

This diff could have used unique_ptr as the return type but then it's easy to accidentally call release() and loose the deleter. Thus going with std::pair<void*, MemoryDeleter> to be explicit.

Also, now CPUAllocator can be effectively changed to std::function. Haven't done it yet, but can do if necessary.

Let me know whether it's a bad idea to proceed like this.

Reviewed By: Yangqing

Differential Revision: D5429830

fbshipit-source-id: 8382ab7b81592d51272056c05c122894bb203827
2017-07-17 15:26:27 -07:00
baef769035 add code comments to memonger
Summary: Add some comments to dag-memonger to help asaadaldien with his C++ port.

Reviewed By: asaadaldien

Differential Revision: D5435459

fbshipit-source-id: dd5d482efb017418d22f42ee79fbd4668bd31bdd
2017-07-17 13:07:33 -07:00
746ddb7364 Fixed error when compiling with clang
Summary:
recurrent_network_blob_fetcher_op_gpu.cc was failing when compiled with clang

(Note: this ignores all push blocking failures!)

Reviewed By: wesolwsk

Differential Revision: D5436161

fbshipit-source-id: f4ea31066fe5abc108c6d6c15ee92bf828a2ff96
2017-07-17 12:52:39 -07:00
a3c9054245 Add comments in loss.py (#2128) 2017-07-17 13:56:19 -04:00
2dc8851206 RNN Workspace Blob Extraction
Summary:
Added operator RecurrentNetworkBlobFetcherOp that takes as input a scratch workspace name and prefix, and copies over all blobs in the scratch workspace into the global workspace. This essentially extracts all intermediate recurrent network computation for each timestep.

Added a wrapper in recurrent.py - retrieve_step_blobs(net, prefix='rnn') - which, when called after an rnn is run, will return a list of all blobs extracted from the net.

Reviewed By: akyrola

Differential Revision: D5421926

fbshipit-source-id: 0f35b466d77d3c719fb0e32de7dbcafc6c0d5225
2017-07-17 10:24:18 -07:00
32b13d6243 Add linter for enforcing caffe operator documentation
Summary: Add lint rule to check that every time we register a caffe operator to CPU or GPU that documentation is added for the particular operator.

Reviewed By: dzhulgakov

Differential Revision: D5348078

fbshipit-source-id: c3fa22fc7ca8066d5fc8fa780b23d7867fd3380e
2017-07-17 08:17:23 -07:00
e233875498 CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera
Summary:
CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera.

The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures.

Reviewed By: Orvid

Differential Revision:
D5432398

Tags: codemod, codemod-opensource

fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914
2017-07-16 21:40:12 -07:00
c7b624651e CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera
Summary:
CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera.

The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures.

Reviewed By: Orvid

Differential Revision:
D5432398

Tags: codemod, codemod-opensource

fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914
2017-07-16 21:24:13 -07:00
ba544aa0ad Add comments in nn.ELU (#2111) 2017-07-16 23:04:11 -04:00
0eeb57a5a2 Detailed per-operator tracking for all nets
Summary:
Implements TEST_benchmark style of tracking for all nets created in the workspace.

I had to do some tricks to invoke stuff in destructors in non-intrusive way. Let me know if it's too hacky.

There are 2 levels of reporting:
- `--caffe2_logging_print_net_summary=1` - prints per-type aggregated stats
- `--caffe2_logging_print_net_summary=2` - prints also individual operator breakdown (might be spammy)

Reviewed By: salexspb

Differential Revision: D5414708

fbshipit-source-id: 40bac2cdf7e3809ab0086150433c376bb5fc7e64
2017-07-16 14:48:09 -07:00
849fb1f7e3 Fix when running with python -O (#2120) 2017-07-16 13:51:14 -04:00
9e2c74cc58 Use scope name for dataset cursor
Summary: Currently the dataset cursor blob is using a fixed name. When we read from multi input tables, the dataset cursor of each table is using the same blob. This messed up the split queue and crashed the reader pipelines (see the errors and failures in https://fb.quip.com/uzbIA7K0PgVe)

Reviewed By: dragonxlwang, rayleichen

Differential Revision: D5419863

fbshipit-source-id: 5983a3d8d2e286dc47c2ec38ed1dbbe30c7c9b49
2017-07-15 19:22:32 -07:00
16dd997239 Spelling tweaks for documentation (#2114) 2017-07-15 13:16:32 -07:00
1c0135b6f2 CreateCommonWorld: pass timeout for storehandler
Summary: Use the CreateCommonWorld timeout for the storehandler as well, not just the device connect.

Reviewed By: andrewwdye

Differential Revision: D5425923

fbshipit-source-id: 936d2129e2db3bfed8759ca097b75843d3931d5f
2017-07-14 19:20:11 -07:00
b6691277f5 binary size util
Summary: This would allow us to inspect the binary size of the builds more easily.

Reviewed By: jonmorton

Differential Revision: D4553515

fbshipit-source-id: 95371bf67e66490a8653b874e1ff79cc987805e6
2017-07-14 17:49:24 -07:00
feecb09517 Added sensible default root location for MKL on Windows
Summary:
MKL on windows works with this change. Tested with MKL 2017 Update 3 (https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2017-release-notes).

Should fix #544

With MKL 2017 Update 3 #514 should not happen too.

Note: I used Anaconda which ships with its own MKL, so I had to make sure that the MKL 2017 Update 3 version was loaded by replacing the .dll in the `%AnacondaPrefix%\Library\bin` folder. Otherwise, numpy would load it's own version and I would have all sorts of missing procedures errors. Now that the same version is available through `conda` this is easily fixed with `conda install mkl==2017.0.3`
Closes https://github.com/caffe2/caffe2/pull/929

Differential Revision: D5429664

Pulled By: Yangqing

fbshipit-source-id: eaa150bab563ee4ce8348faee1624ac4af477513
2017-07-14 17:20:36 -07:00
b68adec7bb adding model loss logic
Summary: Add api model.add_loss(), which allows adding loss, such as optimization and regularization. See change in sparse_nn.py, in which 'model.loss = loss' is changed to 'model.add_loss(loss)'.

Reviewed By: xianjiec

Differential Revision: D5399056

fbshipit-source-id: 13b2ced4b75d129a5ee4a9b0e989606c04d2ca8b
2017-07-14 16:25:23 -07:00
27488b4950 Fix Sumop schema
Summary:
See  D5379274#inline-915415925264163

This causes perf warning in prof_dag device validation.

Reviewed By: harouwu

Differential Revision: D5425968

fbshipit-source-id: 5fdbb0cb692580cf3e5509fe3a52ef3f9556ee4f
2017-07-14 15:12:19 -07:00
bd29260f47 hyposesis_test grad_reference bug fixes
Summary:
1. it was easy to pass grad_reference which was just ignored due to missing output_to_grad
2. threshold was not passed to the gradient checkinglogic

Reviewed By: dzhulgakov

Differential Revision: D5425226

fbshipit-source-id: 2eb41f2601d5e356f7872e57724d08ab2e742329
2017-07-14 14:41:23 -07:00
a7d82b935f Merge commit '9851ef4979bad0c8618e586e711c1bfd8648fd52' 2017-07-14 17:31:21 -04:00
af7aea9f17 Merge commit 'f805a8388be8dc55af0e3aa165b13cd0fce484d3' 2017-07-14 17:29:50 -04:00
366299f9f3 Wrap unbiased flag in var, std, varall, stdall 2017-07-14 17:29:06 -04:00
9851ef4979 Wrap unbiased flag in var, std, varall, stdall 2017-07-14 17:28:14 -04:00
f805a8388b Wrap unbiased flag in var, std, varall, stdall 2017-07-14 17:25:25 -04:00
2f7b6db429 Merge commit 'd2874c560ebd197297ef737a084b6f7ee3f03dc6' 2017-07-14 17:21:16 -04:00
16203f3325 fix test 2017-07-14 17:04:21 -04:00
80d067e70f retain_variables -> retain_graph (#2107)
Closes #1928
2017-07-14 16:45:25 -04:00
d2874c560e lint fixes 2017-07-14 16:32:15 -04:00
2aa8fc7e8d Implementing Semi-Random Features Layer
Summary:
- (Split diff from Arc Cosine)
- Implemented [[ https://arxiv.org/pdf/1702.08882.pdf | Semi-Random Features ]] Layer
- Created a buck unit test for SRF Layer

Reviewed By: chocjy

Differential Revision: D5374803

fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae
2017-07-14 13:15:50 -07:00
83596bdcb1 produce a Declarations.yaml file that describes Functions/Type/Tensor methods that framework produced. 2017-07-14 12:34:03 -07:00
f3f8ce44bd Merge pull request #18 from soumith/master
Fix handling of if_true/if_false in ATen
2017-07-14 15:16:07 -04:00
33ac9cdc10 add ATen tensor support to pytorch tuple_parser (#2102) 2017-07-14 13:56:02 -04:00
38ba935547 operator== for type 2017-07-14 10:39:40 -07:00
128e02d792 allow type inference to work on TensorList 2017-07-14 10:27:05 -07:00
7ee7542fc8 Fix handling of if_true/if_false in ATen 2017-07-14 11:58:03 -04:00
52a9367fa7 Fix minor typo (#2100)
Fixed minor typo in Autograd mechanics docs.
2017-07-14 10:20:13 -04:00
08bb3b7cc8 Merge commit '7e498d2219c8dbeb801fc4cefa36b147bbf76ff4' 2017-07-14 02:55:55 -04:00
43eaa28b9f fix empty Tensor mmap 2017-07-14 02:55:05 -04:00
7e498d2219 fix empty Tensor mmap 2017-07-14 02:54:39 -04:00
a305ce3ece Fix broken seq2seq example
Reviewed By: harouwu

Differential Revision: D5423060

fbshipit-source-id: 4537b020546503a1f9cb237257ab3c42665ae07f
2017-07-13 23:31:54 -07:00
d6bc2642e7 Add ignore_index to NLLLoss2d 2017-07-13 23:22:48 -04:00
7d3511f5f2 Half fixes for ATen and CUDA 9.0 2017-07-13 22:52:39 -04:00
a5a8ab10b0 fix Hardtanh argument names to be consistent between functional and Module 2017-07-13 22:46:51 -04:00
25b591eb05 lint fixes 2017-07-13 22:41:01 -04:00
06f94a7d59 better error message when thread_local is not supported (#2092) 2017-07-13 22:32:10 -04:00
f44991b398 add timeout argument to DequeueBlobs; use 10 min timeout for data workers
Summary: As title. This helps with (quite common) cases where data input is stuck for reason or another, and the net execution never proceeds and is stuck forever.

Reviewed By: andrewwdye

Differential Revision: D5409885

fbshipit-source-id: 840261fd5964408f788fc0f50ece0d74193694ac
2017-07-13 18:52:03 -07:00
34f7acbedf Report bugs in BatchNormalization, the dimension is wrong for second order
Summary: The number input dimension for NHWC should be the last dimension C. Since batch size is omitted, it should be 2 instead of 3.

Reviewed By: chocjy

Differential Revision: D5418538

fbshipit-source-id: a6939a863817b7566198ea2a665a1d236a2cf63d
2017-07-13 18:31:18 -07:00
13980d2bb5 Set device to the default device(CPU) when DeviceContext is None.
Summary:
Fix case when optimizer isn't called within a device scope context.
Fix OptimizerContext lr blob names

Reviewed By: volkhin

Differential Revision: D5421046

fbshipit-source-id: 186a0d05f40d4442c5ba5736084626da73a0c0f1
2017-07-13 17:54:36 -07:00
027264cd64 Merge commit '9e720f15477d2d7a388c5b5ec7d397fa5706d64f' 2017-07-13 19:59:07 -04:00
80b620960c Make QPSMetric count from the first example
Summary:
This fixes super annoying problem of QPS reporting in sparse_nn_benchmarks when QPS "warms up" gradually. The problem is that we create the metrics in init_net and start counting from there. Whereas there can be big delay before real processing begins.

Thus I propose to just start counting from first example seen. It's slightly inprecise too as we miss the first batch, but who cares :)

Reviewed By: harouwu

Differential Revision: D5414672

fbshipit-source-id: 94fcf2e486416f186fed563002864f73c5f1c908
2017-07-13 16:54:29 -07:00
7c14c377df Merge commit 'd8fee1ebe675b9d31894ac79145f2b2629e322e4' 2017-07-13 19:25:56 -04:00
c674923bcc Merge commit 'ed6f5d7038f0e3873c2ed6add2ede7c9ab38e1ea' 2017-07-13 19:24:22 -04:00
113ff22e65 remove unused parameters in logging_is_google_glog.h and operator.h
Summary: This manually fixes a few violations of `-Wunused-parameter` where automated tooling couldn't help.

Reviewed By: meyering

Differential Revision: D5416336

fbshipit-source-id: c089f02dfdf33351406ebad2f52ad9f8c676360b
2017-07-13 16:24:15 -07:00
d8fee1ebe6 add launch_bounds to greedy kernels 2017-07-13 19:23:29 -04:00
ed6f5d7038 add launch_bounds to greedy kernels 2017-07-13 19:23:24 -04:00
9e720f1547 fix bug in method declarations 2017-07-13 16:22:52 -07:00
ab26fa01e6 install vision in devel dockerfile, minor fixes to dockerfile (#2090) 2017-07-13 19:06:41 -04:00
f4ae64a6c7 add isCUDA() on Type 2017-07-13 15:13:20 -07:00
07fcd977bb add cudnn data type processing for ATen tensor (#2087) 2017-07-13 16:37:53 -04:00
54cabb8bf3 Correct negative dim behavior in torch.stack (#2084)
Fixes #1950
2017-07-13 16:29:31 -04:00
42485d87c2 Set the current device in each engine's thread (#2081)
Fixes #2017
2017-07-13 16:24:38 -04:00
ab0d631d6d Adding AllCompare-like function to data_parallel_model
Summary: Added function _RunComparison to data_parallel_model that checks if all shards in a given rendevous have the same value for a given blob_name

Reviewed By: wesolwsk

Differential Revision: D5394164

fbshipit-source-id: c2b07d0f8d5846fa9887d53b0be091a8c057f106
2017-07-13 13:03:57 -07:00
007d6ad816 write generated_cpp. to a file rather than as output to make error reporting clearer. 2017-07-13 11:04:52 -07:00
abd433fa07 Merge commit '6db960fbcff7ae194c6827c73113c222391f2c3e' 2017-07-13 13:49:26 -04:00
6db960fbcf dont clobber gen.py error, fix for old versions of python 2017-07-13 10:45:14 -07:00
384f03f1be Merge commit '48b797a785c1fc6ea34398985c49b2c7c55d28ae' 2017-07-13 10:40:58 -04:00
c011d4f3d6 resolves #1991 (#2073) 2017-07-13 09:57:33 -04:00
f98c384973 Raise error when call from_numpy on 0-dim array (#2075)
* Raise error when call from_numpy on 0-dim array

Fixes: #2055

* reword error message
2017-07-13 09:56:12 -04:00
59c0bb9e5a fix for duplicate input case
Summary: Fix a bug reported by dzhulgakov that occurs when input blobs is used twice in a same op --> it was released to the recycled blobs pool twice.

Reviewed By: dzhulgakov, volkhin

Differential Revision: D5414023

fbshipit-source-id: 861bb46fe901023cb9a496401736e6ecb77d5fae
2017-07-13 01:51:30 -07:00
48b797a785 fix lint 2017-07-13 03:22:31 -04:00
043640c3eb Return top K classes
Reviewed By: kittipatv

Differential Revision: D5363481

fbshipit-source-id: 27ce37878434917c1a7c5f325ed77c989a1448af
2017-07-13 00:20:00 -07:00
8983bf13f4 fix max and min docs 2017-07-13 03:03:27 -04:00
20ce45b0c3 fix EmbeddingSum offsets initialization 2017-07-13 02:57:25 -04:00
1e98155711 long ->size_t 2017-07-13 02:40:44 -04:00
1c14178c65 fix osx compilation 2017-07-13 02:38:56 -04:00
37183e91de add normalize docs to sphinx 2017-07-13 02:31:57 -04:00
14337693d0 Merge commit 'b900a49308cb0363d00add7e123b824fda3eab37' 2017-07-13 01:01:38 -04:00
58e4caf80f add missing docs 2017-07-13 01:01:04 -04:00
3faca65adf Add a unit-test to validate sharing learning rate between
Reviewed By: kennyhorror

Differential Revision: D5413387

fbshipit-source-id: ff4022375183394ca9cee6faea5ac46e56079b86
2017-07-12 21:53:25 -07:00
b900a49308 Merge pull request #11 from soumith/master
Fix ATen build for debug python
2017-07-12 21:51:36 -07:00
c888857461 Conv double backward groups (#1993)
* add support for groups in double backward

* add tests for group in double backward

* fix lint

* separate some tests to reduce number of test cases

* remove redundant testing for different number of output channels
2017-07-13 00:41:14 -04:00
7053b84c0e Merge commit '41abcd4b41308b3453cce6731d896d094b23c62a' 2017-07-13 00:39:35 -04:00
8304dc4d68 Merge commit '703ccbb8cbe1c4ce3eeb62548ce51f71181883d6' 2017-07-13 00:39:03 -04:00
c48d50a2e2 Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors 2017-07-13 00:38:23 -04:00
41abcd4b41 Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors 2017-07-13 00:37:20 -04:00
703ccbb8cb Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors 2017-07-13 00:37:13 -04:00
27da4eafc2 Remove more advanced indexing duplicate tests (#2071) 2017-07-13 00:30:52 -04:00
459cb697b5 Merge commit 'ce96b84ccbdfbbee7f744942b1bb9fdc5924e442' 2017-07-13 00:26:06 -04:00
ce96b84ccb Check for shared_mem size in multinomial single-sample implementation
Handle limited shared memory on function torch.multinomial

Update THCTensorRandom.cu
2017-07-13 00:25:13 -04:00
82e318cf8b Optimizer: one LR op per (device, optimizer)
Summary:
Try running this script through `nvprof`:
```py
import numpy as np
from caffe2.proto import caffe2_pb2
from caffe2.python import brew, core, optimizer, workspace
from caffe2.python.model_helper import ModelHelper

do = core.DeviceOption(caffe2_pb2.CUDA, 0)
with core.DeviceScope(do):
    model = ModelHelper(arg_scope={'order': 'NCHW'})
    conv1 = brew.conv(model, 'data', 'conv1', 1, 20, 5)
    pool1 = brew.max_pool(model, conv1, 'pool1', kernel=2, stride=2)
    conv2 = brew.conv(model, pool1, 'conv2', 20, 50, 5)
    pool2 = brew.max_pool(model, conv2, 'pool2', kernel=2, stride=2)
    fc3 = brew.fc(model, pool2, 'fc3', 50 * 4 * 4, 500)
    fc3 = brew.relu(model, fc3, fc3)
    pred = brew.fc(model, fc3, 'pred', 500, 10)
    softmax, loss = model.SoftmaxWithLoss([pred, 'label'], ['softmax', 'loss'])
    model.AddGradientOperators([loss])
    optimizer.build_sgd(model, 0.01,
                        policy='step', stepsize=1, gamma=0.999,
                        momentum=0.9, nesterov=False)
    workspace.FeedBlob('data', np.zeros((1, 1, 28, 28), dtype=np.float32))
    workspace.FeedBlob('label', np.zeros((1, 1), dtype=np.int32))

workspace.RunNetOnce(model.param_init_net)
workspace.CreateNet(model.net)

for _ in range(100):
    workspace.RunNet(model.net)
```
Before this change:
```
                    1.55%  1.4185ms       837  1.6940us  1.6630us  2.4000us  [CUDA memcpy HtoD]
                    0.72%  656.03us       200  3.2800us  3.1350us  3.5840us  [CUDA memcpy DtoD]
                    0.39%  7.1574ms      1034  6.9220us  3.8300us  18.677us  cudaMemcpyAsync
                    0.00%  34.180us         3  11.393us  9.0960us  12.910us  cudaMemcpy
```
And after it (look at the third column):
```
                    0.73%  657.15us       200  3.2850us  3.1040us  3.6160us  [CUDA memcpy DtoD]
                    0.26%  235.07us       137  1.7150us  1.6640us  2.3680us  [CUDA memcpy HtoD]
                    0.20%  3.4493ms       334  10.327us  6.4220us  16.958us  cudaMemcpyAsync
                    0.00%  37.376us         3  12.458us  9.4120us  15.412us  cudaMemcpy
```
That makes a pretty big difference in performance. Is there any particular reason you decided to have a separate `LearningRate` op for every parameter in 1317e3498c?
Closes https://github.com/caffe2/caffe2/pull/893

Reviewed By: kennyhorror

Differential Revision: D5372541

Pulled By: asaadaldien

fbshipit-source-id: 57357e1be2d58ce294058e9422fb3b1eddfca24d
2017-07-12 21:17:49 -07:00
d6f5452240 Allow to import subclasses of layers
Summary:
We want it to be able to register children of layers who
are not direct children of ModelLayer.
This requires us to find subclasses of ModelLayer recursively.

Reviewed By: kittipatv, kennyhorror

Differential Revision: D5397120

fbshipit-source-id: cb1e03d72e3bedb960b1b865877a76e413218a71
2017-07-12 20:19:47 -07:00
feddb03d58 LP pooling kernels 2017-07-12 19:31:06 -07:00
150ce4c1d8 decode only required # of frames
Summary: Instead of decoding all frames for X-ray video training, decode only sampled frames

Differential Revision: D5365079

fbshipit-source-id: e00dceadaacd9cdd42d83cf0d0e38338dc1f76ef
2017-07-12 19:08:41 -07:00
fe3802d724 match PyTorch syntax 2017-07-12 16:58:57 -07:00
294b0eb901 Remove outside access to OperatorBase::def()
Summary: As Part 1 in reducing the size of operator objects, this removes the outside access to def() and moves debug-uses under a new debug_def() function. Next phase will be by jbai to remove all access from subclasses to def().

Reviewed By: Yangqing

Differential Revision: D5393893

fbshipit-source-id: 7301cff4138dce620b49f6c4db315df85fee7266
2017-07-12 15:50:00 -07:00
b8d0c7fc0d checked cast does it all 2017-07-12 14:41:04 -07:00
ea563c1df1 Make weight norm pickleable (#2066) 2017-07-12 17:21:22 -04:00
2520459617 cpu lp pooling 2017-07-12 14:21:17 -07:00
841173c530 Use NamedTemporaryFile to avoid filename collisions (#2069) 2017-07-12 17:14:42 -04:00
f4c502e8a8 basic cat implementation in ATen 2017-07-12 12:04:24 -07:00
593c5e12e1 Merge commit 'be18499e852d8b292491e27d87dadebe68931fc3' 2017-07-12 14:55:21 -04:00
dc2ed7fd33 Fix ATen build for debug python 2017-07-12 14:52:03 -04:00
81fd2bf2d0 fix some language / typos 2017-07-12 14:47:36 -04:00
8915e2710c Refactor scatter/gather and add distributed docs 2017-07-12 14:47:36 -04:00
ebd5c085dc Fix a memory leak in DataChannelTCP 2017-07-12 14:47:36 -04:00
a9759ef401 Fix undefined symbol errors in THD 2017-07-12 14:47:36 -04:00
02aa5ad9fb make functional layer return scalar if only one output
Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations.

Reviewed By: kittipatv

Differential Revision: D5386853

fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42
2017-07-12 11:34:31 -07:00
f899eafe85 Merge commit '5894864a1c5c9596da0ae88b477ee421e3a5065b' 2017-07-12 14:33:47 -04:00
169ca67a4e Adding Spatial Transformers w/CuDNN support 2017-07-12 14:32:06 -04:00
5894864a1c Adding Spatial Transformers w/CuDNN support 2017-07-12 14:31:14 -04:00
9e4d060348 Exposing TreeWalker & TreeIterator in header file
Summary: These are useful constructs for operators dealing with sparse representation.

Reviewed By: sunnieshang

Differential Revision: D5332077

fbshipit-source-id: 16aa8c4516e6d80f3c44ff348848f0a4a8061f22
2017-07-12 11:06:57 -07:00
a68bb5e3f9 Added device scope checks to data_parallel_model and data_parallel_rendevous
Summary:
Added device scope checks to data_parallel_model and data_parallel_rendevous

Added test to check that checks are working correctly to data_parallel_model_test

Fixed device_scope error in test_synchronization_barrier

Reviewed By: akyrola

Differential Revision: D5403936

fbshipit-source-id: 849c1cd7452692efbc5ef74d2d60ede090c9c017
2017-07-12 10:47:28 -07:00
41c8fee3e7 Merge commit '7c10f1b932fbebdf0e9105f2848229ea22109747' 2017-07-12 12:57:52 -04:00
bb891758bf Merge commit 'a20729244b43f7072797cc5e93898df795455e5b' 2017-07-12 12:57:12 -04:00
7c10f1b932 Avoid two unnecessary copies in addmm backward
The `r_` and `t` tensors become different objects, even though they
point to the same data. Avoid the copy whenever beta=0.
2017-07-12 12:56:17 -04:00
a20729244b Avoid two unnecessary copies in addmm backward
The `r_` and `t` tensors become different objects, even though they
point to the same data. Avoid the copy whenever beta=0.
2017-07-12 12:56:08 -04:00
74fd4bf9e4 quick fix for model_helper __init__
Summary: the init method should also make _parameters_info shared between self and param_model, since params is shared. Otherwise it can cause a inconsistence between _parameters_info and params. Examples of using param_model can be find in rnn_cell.py.

Reviewed By: kennyhorror

Differential Revision: D5405327

fbshipit-source-id: ca8079058e898f529906452163cda234cb30a7df
2017-07-12 08:49:48 -07:00
b9e64ecef1 allow param_info to set optimizer
Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter.

Reviewed By: kennyhorror

Differential Revision: D5385432

fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92
2017-07-12 08:49:48 -07:00
a74fb22b9a fix inplace division for python3 (#2063) 2017-07-12 11:37:55 -04:00
0d91048639 add dummy tensor.data property, to provide interpretable error message to users (#2058) 2017-07-12 10:22:08 -04:00
54e8ef14fb add flag caffe2_serialize_fp16_as_bytes
Reviewed By: kennyhorror

Differential Revision: D5403218

fbshipit-source-id: 755e7a709880f54096a6e5e661554614fc2cc585
2017-07-11 22:20:36 -07:00
823869ba79 Adding tanh to brew
Summary: Added tanh to brew.

Reviewed By: harouwu

Differential Revision: D5395358

fbshipit-source-id: 8eb5303f503e10aec4c59b42055933198d67e9b3
2017-07-11 18:17:52 -07:00
3d1af15a35 logging for all operator calls
Reviewed By: dzhulgakov

Differential Revision: D5332005

fbshipit-source-id: 4a406ee1eb3a8333de10d09b592fe5ecfb3a0f5b
2017-07-11 17:09:39 -07:00
67d2f45e2f Fix net_printer.py
Summary: Fix the unprintable characters fix :)

Reviewed By: akyrola

Differential Revision: D5398914

fbshipit-source-id: 2c607c497f15e324e863ff1dae7bb16199d4074e
2017-07-11 15:26:52 -07:00
10e23943b3 Fix missing _forward_pre_hooks in serialized modules (#2057) 2017-07-11 18:23:35 -04:00
be18499e85 Fix a few C++ warnings
1) Type needs a virtual dtor
2) Tensor move ctor should be noexcept
3) Make constructors from Context* and Type* explicit
2017-07-11 15:18:15 -07:00
1037f30e41 add some documentation to Tensor 2017-07-11 11:00:45 -07:00
192e0546bf fix for back-and-forth models, pass reference instead of copy
Summary:
akirillov again presented me with a memonger-bug: his model that has kind of a 'back-and-forth structure' where blobs are passed left and right in a ladder-like structure, revealed a bug in memonger: I should pass the set of free blobs as a reference, not a copy so that the recyclings are properly accounted for. Hard to explain.

Since we have the graph verifier, we can be more confident with these changes.

I also added some helpful debug to the graph verifier.

Differential Revision: D5396925

fbshipit-source-id: 0bffb3a0bf8532afcd6b5bc9331c779768a8c5c5
2017-07-11 10:52:14 -07:00
78ecc2d3b1 Alias multinomial sampling in Cuda (#784)
* Support Multinomial Alias sampling in cuda

Moving benchmark file

* Review changes
2017-07-11 13:23:35 -04:00
f483679425 Implementation of Alias Multinomial for faster Multinomial sampling (#1046) 2017-07-11 13:22:36 -04:00
c8afdb6f4b Caffe2: Add Open method to DBReader which takes DB pointer
Summary: Currently the DBReader always creates the DB instance itself when Open is called.  Add an Open method that takes in a DB pointer and takes ownership of it, so the DB can be initialized outside the DBReader.

Reviewed By: panshen1

Differential Revision: D5392458

fbshipit-source-id: d8660ab41d349f32030e4934b47bd17256a440df
2017-07-11 09:08:54 -07:00
dfd5d8d0fe Avoid two unnecessary copies in addmm backward (#1971)
The `r_` and `t` tensors become different objects, even though they
point to the same data. Avoid the copy whenever beta=0.
2017-07-11 11:55:22 -04:00
158c7e86dd add basic gitignore, thpp -> at doc fix 2017-07-11 08:32:58 -07:00
73128f7b08 fix minor typos (#2051)
* Update extending.rst

fix typo

* Update cuda.rst

fix typo
2017-07-11 11:01:41 -04:00
f536c662bf fix op in docs (#2048) 2017-07-11 10:36:19 -04:00
2ecb18881c add DynamicType variants for ATen functions. 2017-07-11 10:35:03 -04:00
9d8cff9bc1 initialize aten and pytorch to share the same THCState 2017-07-11 10:35:03 -04:00
ab3d85c410 add build commands for ATen 2017-07-11 10:35:03 -04:00
e58e27cf16 Add 'torch/lib/ATen/' from commit '9d0c674cb7bcfae989d69f988363c1688c22fa89'
git-subtree-dir: torch/lib/ATen
git-subtree-mainline: 3314d51dcc1535dc2d00d357be889807d1bb8c57
git-subtree-split: 9d0c674cb7bcfae989d69f988363c1688c22fa89
2017-07-11 10:33:24 -04:00
3314d51dcc Add __repr__ to Avgpool and maxunpool layers (#2047) 2017-07-11 10:13:22 -04:00
e89e71c595 Simplifying Random Fourier Features and layer test
Summary:
- Condensed operators in RFF layer
- Adjusted RFF layer test; made test code more concise

Reviewed By: chocjy

Differential Revision: D5391436

fbshipit-source-id: 08748861cd6fb4a9e4cc9c8762996371492020a1
2017-07-11 00:40:53 -07:00
1ef1dd9cad Add comments for readability (#2005) 2017-07-10 23:02:56 -07:00
3a073c591b improve SumOp error message
Summary: When Sum was called with other type than float and int, it just returned false without any helpful error.

Reviewed By: asaadaldien

Differential Revision: D5394070

fbshipit-source-id: 0f3c543a39f89163bccb9f55ea394e1d53561b62
2017-07-10 19:33:50 -07:00
97193478c7 Implemented GRUCell
Summary: Implemented python logic and tests to create an RNNCell for GRU.  Uses the preexisting GRU Unit Op code.

Reviewed By: salexspb

Differential Revision: D5364893

fbshipit-source-id: 2451d7ec8c2eacb8d8c9b7c893bfd21b65fb9d18
2017-07-10 17:52:25 -07:00
2409c2e359 GRUUnit Op Backwards Pass
Summary:
Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell.
Functions were created to mimic LSTM implementation as closely as possible.
Backwards pass implementations are defined in GRU_unit_op.{h, cc}
assertGradientChecks call added to gru_cell_test.py

Reviewed By: salexspb

Differential Revision: D5364856

fbshipit-source-id: 09cff4478091827763b40cc331e4e0abf0ec258f
2017-07-10 17:52:24 -07:00
279f3f095e Implemented Gated Recurrent Unit (GRU) c++ operator forward pass
Summary:
Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell.
Functions were created to mimic LSTM implementation as closely as possible.
Implementation defined in GRU_unit_op.{h, cc}
tests put in gru_cell_test.py, which import rnn_cell_test_util.py for sigmoid, tanh, and _prepare_rnn functions.

Reviewed By: jamesr66a

Differential Revision: D5363697

fbshipit-source-id: f9ba9fe0be01ffc868dd22027be8be4975b84998
2017-07-10 17:52:23 -07:00
48bd102b95 Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file.
Summary:
Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file.
Also renamed _prepare_lstm to _preapare_rnn since it is being used for both setting up and LSTM and GRU model.

The reason for this commit is to allow the creation of GRU Op and testing code without copying and pasting code for sigmoid, tanh, and setting up an rnn unit op mode.

Reviewed By: jamesr66a

Differential Revision: D5363675

fbshipit-source-id: 352bd70378031f1d81606c9267e625c6728b18fd
2017-07-10 17:52:22 -07:00
4b1ebd2f65 Fast path for serializing large floating-point tensors to protobuf
Summary: Our existing serialization routines take a significant amount of time for large numpy arrays in order to verify the type of each element in the array as well as converting each element to a canonical type.  For large floating-point tensors, such as model parameters, this checking and converting takes a significant amount of time.  Adding a fast track path for just float32 arrays as this is the most common use case to worry about.

Reviewed By: akyrola

Differential Revision: D5389953

fbshipit-source-id: 26f44cb2426ea3efb849e7707b27d5485f69956c
2017-07-10 17:52:22 -07:00
c096c188c3 minor leaky relu bug fixes
Summary:
numpy.random.rand generates samples from [0, 1) and therefore, the leaky relu test cases weren't testing negative inputs.  Tests still pass after change.

Leaky relu can be used in-place, but gradient took X rather than Y.  Technically, the result is no different as it's just used for a sign test in the gradient, but updated it to take Y to reduce confusion.

Differential Revision: D5390126

fbshipit-source-id: d0c428abbb2797eb33902a7d2a2f59d5e85daaa6
2017-07-10 16:04:45 -07:00
98206c326e Fix ref counting in wrapped tuple functions (#2042)
Fixes #1963
2017-07-10 18:46:06 -04:00
9d0c674cb7 always use a custom default float 2017-07-10 15:37:18 -07:00
bff762c3ff python style fixes 2017-07-10 15:37:07 -07:00
10a8ccf27f only test gets for advanced indexing with duplicates (#2041) 2017-07-10 16:05:55 -04:00
0a9e8a23ef add atan2 function to autograd (#2040) 2017-07-10 16:04:35 -04:00
720db19fa2 make GetComputedParams work like GetParams
Summary: GetComputedParams tests namescopes with equality while GetParams tests with a prefix.  Switching GetComputedParams to also use a prefix so that both functions have similar usages.

Reviewed By: akyrola

Differential Revision: D5389816

fbshipit-source-id: 0e43e4b491fccbad3b855b6b735dc2b91d7626c9
2017-07-10 12:30:44 -07:00
d9daad509d Serialize float16 tensors as bytes to get rid of 50% overhead
Summary: When we use int32_data field for float16 tensors serialization it's possible to end up with up to 50% larger representation than can be achieved using byte_data. The reason for it is varints (https://developers.google.com/protocol-buffers/docs/encoding#varints). In worst cast (when highest sign bit is set) it uses 3 8-bit blocks i.e. 24 bits for each number. Saving in byte_field removes this overhead.

Reviewed By: Yangqing

Differential Revision: D5375267

fbshipit-source-id: 0068daed25cd0157ea80a768b6e3899ea2bd8caf
2017-07-10 11:19:09 -07:00
d88cb87300 add dilated convolution guard to nnpack op
Summary:
dilated convolution semantics were added after the nnpack op, so the feature
check macro was not there originally.

accept2ship

Reviewed By: ajtulloch

Differential Revision: D5387287

fbshipit-source-id: 139ca8c6ad4211ceec8f24982f1f060144524401
2017-07-10 10:46:27 -07:00
ff3996acb9 Add NormalizeL1Op for doing L1 nomalization along given axis
Reviewed By: salexspb

Differential Revision: D5380220

fbshipit-source-id: 38fc56a1013c25b0c8b0fc161ca54fea412fb8b2
2017-07-10 10:10:36 -07:00
6ea71155c1 Implementing Arc Cosine Layer
Summary:
- Implemented the [[ http://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf | Arc Cosine ]] layer
  - Developed buck unit test for Arc Cosine

Reviewed By: chocjy

Differential Revision: D5367604

fbshipit-source-id: ffd3ee081bc055b06c075c34aa6ce329b62ce2e0
2017-07-10 10:10:36 -07:00
8b003565ec remove inaccessible median variant (#2015)
With the addition of medianall() this variant can no longer be accessed, because both it and  medianall take no arguments.
2017-07-10 10:42:45 -04:00
53ac2d46c6 Fix typos in docstrings. (#2034) 2017-07-10 10:35:46 -04:00
318ea29a86 Merge commit 'ab3a9e177ee5eb7d39de2d385ba1e141858e8329' 2017-07-10 10:30:24 -04:00
ab3a9e177e Fix sdot_ bug for runtime F2C symbol conflicts by using cblas where available 2017-07-10 10:29:26 -04:00
46a868dab7 [Ready] Limit docs line length (#1900)
* some docs are ready

* docs

* docs

* fix some more

* fix some more
2017-07-10 10:24:54 -04:00
581921f696 support unsafe functions for getting/constructor tensors from TH objects for backward compat. 2017-07-09 21:25:38 -07:00
0025e1c776 Fix typos in the docstrings of Conv3d, AvgPool3d and MaxPool3d (#2030)
* Fix a typo of the docstring of Conv3d

* Fix typos in docstrings of 3D operations.
2017-07-09 23:20:07 -04:00
9cba97a833 Pairwise-exchange benchmark with bandwidth measurement
Summary: A simple benchmark to determine network bandwidth for pairwise communication.

Reviewed By: plapukhov

Differential Revision: D5159607

fbshipit-source-id: d16c3ed3a0c2ae182138df91bdae821f5508c6ac
2017-07-09 15:55:20 -07:00
c6d7e1e6bf added input size checks to batchnorm (#2020) 2017-07-09 15:31:24 -04:00
3598bdd044 Modify samplingTrain layer to take more general inputs
Summary: As desc.

Reviewed By: kittipatv

Differential Revision: D5363486

fbshipit-source-id: cb8fa65d750e80d2bf3e9909ca9b2d83a5548099
2017-07-08 22:19:55 -07:00
35ad7be55f Add Sum operator to mobile
Summary: Moving the Sum operator into its own file (elementwise_sum_op.cc)

Reviewed By: oyvindkinsey

Differential Revision: D5379274

fbshipit-source-id: c504d91c9fb5e95b369f2aa7e7b5be31fd8e0d4b
2017-07-08 11:06:05 -07:00
dc13345eb3 Read pretrained weights using binary mode in caffe_translator.py
Summary:
Binary mode must be explicitly specified when reading binary files under windows.
Closes https://github.com/caffe2/caffe2/pull/883

Differential Revision: D5373073

Pulled By: Yangqing

fbshipit-source-id: afedebdc74c954dbb6d24c0bccc192c8712c4c88
2017-07-08 10:17:57 -07:00
49f679d0e9 Acknowledge the existence of cpu HalfTensor (#2018) 2017-07-08 10:03:36 -04:00
5f63f5697a IndexHash
Summary:
1. IndexHashOp
2. Helper class SparseFeatureHash
3. FeatureSpec changes to add desired_hash_size

Reviewed By: kennyhorror

Differential Revision: D5361370

fbshipit-source-id: bf02e3ca12b3654f1d291f77c8af9248b6c4ac55
2017-07-07 23:06:11 -07:00
f0788afb0c lazily initialize cuda so that we behave similar to PyTorch 2017-07-07 22:21:31 -07:00
86b6a6e2f8 Added PiecewiseLinearTransform CUDA Op
Summary: Added a CUDA implementation of the PiecewiseLinearTransformOp.

Differential Revision: D5378537

fbshipit-source-id: 38857f59f5cc52e16e1ecc97983a0b0b82a46c74
2017-07-07 15:20:00 -07:00
cb7f17ab64 added gradients for ResizeNearest (CPU + CUDA) and ref
Summary:
# Added the gradients of the operation for both CPU and CUDA kernels.
  # Unified variable names across all ops.
  # Added reference implementation in numpy.
  # The gradient check needs a larger stepsize to succeed, is that normal?

Reviewed By: akyrola

Differential Revision: D5313682

fbshipit-source-id: aceb92649e01c5caeba8774e678f9095502d396c
2017-07-07 14:19:42 -07:00
febae7b20b fix a bug in the report function of Data_Parallel
Summary: replace params with sp, otherwise it will report an empty list

Reviewed By: akyrola

Differential Revision: D5382716

fbshipit-source-id: 34d8e6ee00cbe1718702e3d1f23ea12f8d65063e
2017-07-07 13:03:46 -07:00
6bff82eb6a Revert threadpool minWorkSize change on iOS
Reviewed By: sf-wind

Differential Revision: D5380298

fbshipit-source-id: fdf98bdda30e8cd6689c59fcc0357bca129d409b
2017-07-07 12:41:52 -07:00
a4dc7dcd04 osx build issues and clang warnings 2017-07-07 11:50:02 -07:00
5dd05ed8ee remove Sparse from dispatch for now, will add dispatch variants later 2017-07-07 11:40:08 -07:00
0a34f05d5b Always include THNN in the build, don't check for CUDA twice
As a result, the project builds on MacOS with gcc-6 (without CUDA).
2017-07-07 14:14:02 -04:00
4fda678a85 fix build issue when cuda does not exist 2017-07-07 10:54:17 -07:00
8cedf35d55 Adding Random Fourier Features to SparseNN Model and Flow
Summary:
- Integrated RFF into the preprocessing workflow for dense features
- Developed Flow interface to input RFF parameters
- Created unit test for using RFF with sparseNN

Reviewed By: chocjy

Differential Revision: D5367534

fbshipit-source-id: 07307259c501a614d9ee68a731f0cc8ecd17db68
2017-07-07 09:39:32 -07:00
ebdec9a837 Skip distributed tests if not supported (#2004) 2017-07-07 11:06:56 -04:00
c3c7845572 added asserts that grad_output + input are contiguous (#2000) 2017-07-07 09:14:02 -04:00
f8089c789c One more proto_utils.h fix
Reviewed By: ajtulloch

Differential Revision: D5380322

fbshipit-source-id: b1aa445984bf87feb81dcf08f782f48777d359c5
2017-07-07 02:47:50 -07:00
90d0762d14 Use torch.arange instead of torch.range in test_torch.py (#1996) 2017-07-07 00:06:31 -04:00
ad62e82179 fast simple-net memonger for C++
Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.

Next I will integrate this with predictor and run canary (separate diff).

Reviewed By: asaadaldien

Differential Revision: D5375392

fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
2017-07-06 15:17:07 -07:00
e8689dda8f Python 3 compatible integer division
Summary:
As the title says.
Closes https://github.com/caffe2/caffe2/pull/879

Differential Revision: D5372787

Pulled By: akyrola

fbshipit-source-id: 0ff469c0d227f1b2252c1a0c4f6f8bebaac5580f
2017-07-06 11:47:12 -07:00
31f394f8b3 Add synchronization barrier API to data parallel model
Summary: Add synchronization barrier API with configurable timeout. Users can call Synchronize() to join variable length execution before resuming multi-machine communication steps, i.e., resuming distributed training iterations after validation on a single machine.

Reviewed By: akyrola

Differential Revision: D5348387

fbshipit-source-id: 5826da10e6a60c50394c36c7cf47624f10191d11
2017-07-06 09:21:19 -07:00
43c46cc883 Reduce default ThreadPool min work size (~25% speedup for segmentation on S7).
Summary:
I noticed this when experimenting with the compute-bound convolutions
for the ULP HWGQ binary conv/gemm.

It's an ugly heuristic that Maratyszcza and co. are improving this half, but I think
this will be a net win for C2 especially if segmentation/mask r-cnn are
critical.

Differential Revision: D5375976

fbshipit-source-id: 863f76d434f133bf5a00e7ced1cfadfcf92e3c84
2017-07-06 08:32:32 -07:00
21ba0ff560 small fix to when input blob is input to multiple ops
Summary: Memonger had a bug that it crashes if an input blob was input to multiple ops. This fixes that and adds a test.

Reviewed By: asaadaldien

Differential Revision: D5374860

fbshipit-source-id: 1d5044001eacdbe6db43f69727da9297558f5c5c
2017-07-05 22:37:26 -07:00
2d133d4627 increase concurrency default
Summary: Huge improvement in my tests, and it does not really hurt either.

Reviewed By: wesolwsk

Differential Revision: D5374925

fbshipit-source-id: c96a4ed2ca653120a82233c0037cbfded8a2d2a1
2017-07-05 21:46:31 -07:00
78a4fd1044 Add Caffe2 op for Gloo barrier
Summary: Add Barrier to Caffe2 communicator ops. Add implementation using Gloo's BarrierAllToOne collective.

Reviewed By: akyrola

Differential Revision: D5348268

fbshipit-source-id: a21f2c98e946541e108644d150684fcd12312a0f
2017-07-05 17:37:22 -07:00
73fead9f8f add shape alias (#1983) 2017-07-05 19:12:37 -04:00
be7725b0ba Tests: fix dpm test when only 1 GPU present
Summary:
b33894e95d removed this line:
```py
unittest.skipIf(workspace.NumCudaDevices() < 2, "Need at least 2 GPUs.")
```
but forgot to add it back later.
```
_________________________________ DataParallelModelTest.test_equiv __________________________________
...
            if p2p_access_pattern is not None and not p2p_access_pattern[
>               devices[0], peer
            ]:
E           IndexError: index 1 is out of bounds for axis 1 with size 1
...
WARNING:data_parallel_model:** Only 1 GPUs available, GPUs [0, 1] requested
```

/cc akyrola
Closes https://github.com/caffe2/caffe2/pull/888

Reviewed By: akyrola

Differential Revision: D5341310

Pulled By: harouwu

fbshipit-source-id: 8d7f06913c7b5a42009a4033dbb6a48a8e812822
2017-07-05 14:32:12 -07:00
87730360d1 Small improvements to CreateOperatorDef
Summary:
- allow initializer lists directly with `vector<string>{}` part thanks do default initialization
- reduce the number of instances

Reviewed By: nicolasvasilache

Differential Revision: D5370056

fbshipit-source-id: b8fae3b12144257644e098b284df7369d5bdb377
2017-07-05 11:50:01 -07:00
4fddc04054 Use the same schema of switching to device reduce sum for SumSqrElements
Summary: Based on benchmark script located at `caffe2/experiments/python/device_reduce_sum_bench.py`, device reduce sum is slower for N <= 10000, so we only switch to use device reduce for large N in SumElements. This diff applies the same schema for SumSqrElements.

Reviewed By: jamesr66a

Differential Revision: D5369868

fbshipit-source-id: ae13a611aff9d3464d1c4950ee155c740a2da339
2017-07-05 10:52:17 -07:00
60e4607106 brew API in convnet benchmark
Summary: upgrade convnet_benchmarks to brew api

Reviewed By: salexspb

Differential Revision: D5341829

fbshipit-source-id: f34c6dd4aae5f0c8db51e7600eb1f0e1cdc72ea3
2017-07-05 10:34:48 -07:00
e60bc2df85 TravisCI: run Python tests
Summary:
Run Python tests (with pytest) on TravisCI.
Closes https://github.com/caffe2/caffe2/pull/817

Reviewed By: bwasti

Differential Revision: D5332944

Pulled By: harouwu

fbshipit-source-id: 29dfa3f19bf100ba9a04b048489f6a63e426416d
2017-07-05 10:10:04 -07:00
3748b6d3eb Data parallel fix for https://github.com/pytorch/pytorch/issues/1857 (#1880)
* Data parallel fix for https://github.com/pytorch/pytorch/issues/1857
searches recursively for variable in input

* parallel_apply.py lint
2017-07-05 11:46:00 -04:00
25bd5dda27 Implementing random fourier features layer
Summary:
- Created the random fourier features layer
- Generated a unit test to test the random fourier features layer is built correctly
- Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf |   Random Features for Large-Scale Kernel Machines]]

Reviewed By: chocjy

Differential Revision: D5318105

fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176
2017-07-04 23:48:42 -07:00
b3589b04fd Fix exceptions not being caught (#1948)
Adding -fexceptions to both torch and pytorch C/C++ builds fixes tests
not passing.

Closes #1297
2017-07-05 00:25:39 -04:00
03aec5ae53 Add Tensor::FreeMemory
Summary: Similarly to Blob::Reset but keeps dimension information.

Reviewed By: rayleichen

Differential Revision: D5365655

fbshipit-source-id: c0f99c020bbabe93ff2a2ee5d519a5f467fda5ba
2017-07-04 19:31:17 -07:00
5964394a4c return empty iter when tensor is empty 2017-07-04 17:29:27 -04:00
1aaa24d99b add medianall prototype to docs 2017-07-04 16:52:36 -04:00
295ed7e264 Merge commit 'ab7d4e2bcea5cae8f05873fb0bbb31985cc58d47' 2017-07-04 16:47:48 -04:00
ab7d4e2bce add missing definition 2017-07-04 16:46:04 -04:00
ae65236490 Fix typo 2017-07-04 15:19:05 -04:00
c2069a15e0 Merge commit '56df97ce939985a30dcfefb1136bf45faf64413c' 2017-07-04 15:18:14 -04:00
56df97ce93 remove unnecessary contiguous assertion 2017-07-04 15:17:15 -04:00
89c682dfb9 Merge commit '0dbf871d9ec424f1a7897af77bf93219d3be23bf' 2017-07-04 14:56:53 -04:00
ae839f4b2e Merge commit 'f425c5216b7fe35dd03e0161a3440ec968c63636' 2017-07-04 14:56:22 -04:00
05c2bafc9d Have median reduce over all dims and return just the value when dim is not provided 2017-07-04 14:55:37 -04:00
0dbf871d9e Have median reduce over all dims and return just the value when dim is not provided 2017-07-04 14:55:30 -04:00
f425c5216b Have median reduce over all dims and return just the value when dim is not provided 2017-07-04 14:55:19 -04:00
635bb5ec9d corrects typo 2017-07-04 11:09:40 -04:00
00e5afea6a Adding dedup aggregator options to sgd optimizer
Summary: As desc.

Reviewed By: xianjiec

Differential Revision: D5324671

fbshipit-source-id: 27f3a58f618cd5ea11c2ea2e756df3f73635c2c8
2017-07-04 02:10:18 -07:00
2ac9ff5c96 Cos, Sin, and Abs operators
Summary: add Cos, Sin, and Abs operators

Reviewed By: akyrola

Differential Revision: D5307632

fbshipit-source-id: 743c9d289e4d3fd439e4b5385841cdff87d9247a
2017-07-03 22:18:32 -07:00
a7f6b0ab4f Merge commit 'e5bac2dd2d69772938482c1431db1fc1efb64c6f' 2017-07-03 20:41:28 -04:00
e5bac2dd2d Add critical section to BLAS gemm.
This is needed because of possible races in SpatialConvolutionMM (and others that use gemm)
if the BLAS library is not thread-safe.

In terms of performance, there's not much benefit to run two gemms in parallel, because the
BLAS libraries have their own all-occupying gemms anyways.
2017-07-03 20:40:21 -04:00
090506ac87 Add NCCLBroadcast to correct net
Summary:
Otherwise was always added to main net instead of param_init_net when
desired (i.e. initial param sync)
Closes https://github.com/caffe2/caffe2/pull/894

Differential Revision: D5367451

Pulled By: akyrola

fbshipit-source-id: 3d82be6da687c736bd15f4852dbd272266eb4811
2017-07-03 16:54:44 -07:00
ec8da55a7d bind THS THCS, leaving all operators unimplemented. This is required because THPP can represent Sparse tensors even though the wrapper doesn't implement any operators. 2017-07-03 16:52:41 -07:00
b4414c0dc3 Handle None in modules list.
It's often useful to add None to an nn.ModuleList to keep the indexing
of the module list to match some other property.
2017-07-03 18:53:21 -04:00
39edc378fb Fix lint. 2017-07-03 18:51:22 -04:00
f6578c1b24 Implement double backwards for Dropout and FeatureDropout. 2017-07-03 18:51:22 -04:00
daa84e7663 Implement bilinear double backward. 2017-07-03 18:51:22 -04:00
1aa145dbac Implement ConstantPad2d double backwards. 2017-07-03 18:51:22 -04:00
d4b8834131 Improve non-contiguous testing in TestAutograd: (#1933)
* Improve non-contiguous testing in TestAutograd:
1) Test gradcheck and gradgradcheck with non-contiguous inputs
2) Test gradgradcheck with non-contiguous gradoutputs (gradcheck would take more work)
3) Fix discovered issue in Prod backwards.

* Simplify non-contiguous setting wrt View.
2017-07-03 18:49:52 -04:00
699d1ec7fb Address flaky Norm test issues:
1) Add a correction for 1.5 norms to ensure input can't be zero.
2) Increase test tolerance.
2017-07-03 18:48:22 -04:00
05062a1439 Better handle random seeds in tests.
Previously, there were 2 issues with test_autograd randomness:
1) Many random operations (e.g. random selection in prod_zeros) happened
   before the torch random seed was set (because it was set in run_tests
   at the end of the file.
2) The random seed was not set consistently: run_tests would set it to the
   proper value, but each call to setUp would set it to 0 (because SEED wasn't
   global in run_tests), which made setting the seed mostly worthless.
2017-07-03 18:48:22 -04:00
e187ba7a9f Decrease likelyhood that Fmod/Remainder tests fail due to numerical jacobian check.
Previously, these tests added 5e-2 to the denominator tensor (the same as the div
tests), which only avoids divide by 0, but not issues with computing the numerical
jacobian due to non-linearity of fmod/remainder, when input / divisor is close to an
integer.  These tests now add 1.5 to the denominator, which is the same as the non-tensor
version of the tests; Note that we can still hit the above condition but it will be much
less likely.
2017-07-03 18:48:22 -04:00
35ed224d04 Merge commit '8a24f2b4d8646de10b497c2eca2f1edc525a1e09' 2017-07-03 00:49:59 -04:00
72b292d45c Merge commit '733a7c6d9a22dfc9be1b11d47384991208658bfb' 2017-07-03 00:49:52 -04:00
5b4cd9bb49 Merge commit 'c691fc6dc711814a06107d4a9b763f34bff5afca' 2017-07-03 00:49:34 -04:00
c691fc6dc7 Add a nonContigDim reduction kernel to improve latency for small tensors. (#768) 2017-07-03 00:39:40 -04:00
42cf68b402 Make reduction functors accept only constant arguments (#753)
(similar to MaxValuePair and MinValuePair above).
2017-07-03 00:35:39 -04:00
8a65ef1098 cc 2.0 -> 3.0 in docs. 2017-07-02 22:08:42 -04:00
b6c1c0ac4e Fix communication_schema decoding
Summary: Allows to override the input/output record as long as the field blobs are the same.

Reviewed By: yangyangyyy

Differential Revision: D5362132

fbshipit-source-id: 3ac2ac22802902b7eed5c226b00a7e1971ad264c
2017-07-02 13:04:20 -07:00
406040f6a9 fix torch.is_tensor not recognizing HalfTensor (#1934) 2017-07-02 10:13:44 -04:00
e26139b7f7 fixed shapes in GRU and LSTM docs. 2017-07-01 23:15:10 -04:00
457587088a Fix broadcasting issues in binary_cross_entropy_with_logits (#1944)
* done re-seed cuda device if in bad fork

* avoid broadcasting in binary_cross_entropy_with_logits

* assert input sizes for BCEWithLogitLoss

* added check that BCEWithLogitsLoss == Sigmoid + BCELoss

* fix flake8 issues

* rename test_bce_with_logits_gives_same_result_as_bce_and_sigmoid -> test_bce_with_logits_gives_same_result_as_sigmooid_and_bce_loss

* add warning in BCELoss about input shapes

* fix lint
2017-07-01 23:06:36 -04:00
d43b42fb37 allow querying tensor device + tool to validate that all ops have tensors from correct devices (GPUs)
Summary:
Quite common, hard-to-debug, performance bug for multi-GPU training has been that operators have been passed tensors that reside on different GPU than what the op runs on. Since we have peer access enabled, this works, but is just much slower. With data parallel model this problem arises rarely as it has static analysis of the operators, but if someone bypassed DPM or uses FeedBlob with incorrect device options, this problem can happen.

To make debugging easier, I added device-field to tensor that stores the device information that allocated the memory. In addition, I added a function to go through operator inputs and outputs and compare their tensor device to the operator device. This check is run after first iteration with prof_dag only.

Also renamed ShapeCall to TensorInfoFun, as it now returns so much other info than the shape.

I think this is pretty safe diff, but do you find it problematic to add a new field to tensor?

Reviewed By: dzhulgakov

Differential Revision: D5335505

fbshipit-source-id: 511b6c122dff9a205f43951984868ffd40f7ac30
2017-07-01 09:16:37 -07:00
c0cebc3578 Added flags to lstm, convnet and sparse_nn_benchmarks to print out operators
Summary: pass flags directly to C2

Reviewed By: salexspb

Differential Revision: D5345869

fbshipit-source-id: 22b0e791526c7b0caf1e6a13dd29900df0db8fe8
2017-06-30 23:47:04 -07:00
ab0fe0a5f4 add debug information when there is blob version mismatch
Summary:
It is quite common question when users get some variant of "blob has version 2 but gradient expects version 1" in their backward pass. The error message is completely unhelpful.
To remedy this, I added proper debug information which tells user how the version number of a blob was incremented over time. i.e which ops caused the version to go op. This should help
understand the issue.

Reviewed By: dzhulgakov

Differential Revision: D5358227

fbshipit-source-id: bc09d048ac33200c35d56460e44e86c2f2888f3f
2017-06-30 16:22:46 -07:00
f3a59aedff Use cub::DeviceReduce for faster math::Sum CUDA version
Summary: Port SumElements and softmax_ops.cu to use device reduce sum

Reviewed By: akyrola

Differential Revision: D5351881

fbshipit-source-id: ca9604186c261ffcb1480da2a17baab8a4809372
2017-06-30 15:04:06 -07:00
da0fad8a7a Use torch.matmul in nn.Linear (#1935)
This takes advantage of the broadcasting behavior of torch.matmul to
support inputs with more than two dimensions. The extra dimensions are
treated like part of the batch dimension, much like nn.Bottle in Lua
Torch.

There are a few related small performance changes:

 * Addmm computes the gradient in column-major for inputs in
   column-major format
 * Variable.mm calls Addmm in-place with the desired output buffer
2017-06-30 16:53:26 -04:00
2c038f2074 Add weight normalization implementation (#1945)
* Add weight normalization implementation

This adds forward "pre-hooks" which get called before the module's
forward() method. Weight norm is implemented as a hook which calculates
the weight variable from the weight_g and weight_v every iteration.

Based on @rtqichen implementation.

* Specify return type
2017-06-30 15:41:40 -04:00
b3e500c522 fix docs generation warnings 2017-06-30 14:39:21 -04:00
b3f6ff1b3d Fix unused linker argument warnings. (#1958)
* Fix unused linker argument warnings.

This patch began when I noticed the following clang warning:

clang: warning: -Wl,-rpath,RIGIN: 'linker' input unused
clang: warning: argument unused during compilation:
'-L/home/ezyang/local/pytorch/torch/lib/tmp_install/lib'

The warning is minor, but I was a bit worried our rpath wasn't
setup correctly.  Actually, it was, and there wasn't a problem,
but I had to spend some time figuring out exactly what as going
on, and by the end of it, I might as well fix the warning.  In the end, I ended
up filing two upstream tickets for ccache and cmake:

- https://github.com/ccache/ccache/issues/189
- https://gitlab.kitware.com/cmake/cmake/issues/17025

We can remove the warning by using CMAKE_EXE_LINKER_FLAGS and
CMAKE_SHARED_LINKER_FLAGS, which have sane macro expansion rules
(although still slightly insane: the first level of escaping gets removed.)
To ensure that the rpath was being set correctly, I ran
objdump -x torch/lib/build/TH/libTH.so | grep RPATH and verified that ORIGIN
was setup correctly.

I also considered using CMAKE_INSTALL_RPATH, but the rpath here doesn't
seem to get set until you actually install, which is a change in behavior,
and I wasn't sure if anyone was relying on rpaths being setup in the build
directory.

There is a SLIGHT behavior change, in that if we happened to need these
LDFLAGS passed to the static linker, they won't get passed. I don't
think we ever build static libraries today so this shouldn't be aproblem.

P.S. Because of the ccache bug, you may continue to see these warnings
after this patch.  If you apply https://github.com/ccache/ccache/pull/190
and clear your cache, it will solve the problem.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Remove unnecessary -Qunused-arguments

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-30 14:15:31 -04:00
5aa147f273 added PackRNNSequence and UnpackRNNSequence operators
Summary: Added two operators that can be used to tranfer data into the input format of RNN and back.

Reviewed By: kittipatv

Differential Revision: D5329886

fbshipit-source-id: 07eac29416427b08c49989d4eeed50a6f18493a1
2017-06-30 09:53:31 -07:00
8c74c36626 fix reducing device option
Summary: This was broken in a previous diff, fixing it to use model device type.

Reviewed By: asaadaldien

Differential Revision: D5356005

fbshipit-source-id: a4fcc932bae772076b57625a5fcc0d38eb702cc9
2017-06-30 09:19:57 -07:00
326e314695 Add optional timeout to Gloo ops
Summary: Add an optional timeout parameter to CreateCommonWorldOp, to be honored on dependent collective operations.

Reviewed By: akyrola, romain-intel

Differential Revision: D5348099

fbshipit-source-id: cf5131450c389c7e40b1dabf8334c486e02e0011
2017-06-29 17:18:30 -07:00
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
a6dee1da32 Make args.fixed_shape in lstm_benchmark work in a library mode
Summary:
this works as a standalone python script because args are
global. When used from Flow for monitoring purposes it doesn't
work. This diff fixes it

Reviewed By: zem7

Differential Revision: D5349996

fbshipit-source-id: f73842901d975b783e09e9db0565eb81880bbea1
2017-06-29 14:55:26 -07:00
dd6e170b8d fix LSTM benchmark reporting
Summary:
A couple of fixes to fix broken rerporting of lstm_benchmark:
- last_time must be recorded after warm up
- entry count was incorectly removed

Reviewed By: salexspb

Differential Revision: D5349890

fbshipit-source-id: 5dd5bdf46594c520b61bc3b57b153f90a6a17903
2017-06-29 13:53:17 -07:00
6c67a753c7 Fix test_pair_wise_loss_predictions
Summary: Increase absolute error tolerance.

Reviewed By: tomdz

Differential Revision: D5349604

fbshipit-source-id: 8e04001b0b6a6e83083f341e265ab3c0d2b06918
2017-06-29 12:48:04 -07:00
912ee4e40a Fix test_sparse_to_dense precision failures
Summary: ..

Reviewed By: tomdz

Differential Revision: D5349561

fbshipit-source-id: 4c510905515eb03a64abc36f33d59a1d998c2ab1
2017-06-29 12:48:03 -07:00
83765906c6 Add min_satisfying_examples
Summary:
Eliminates failures from overloaded machines from only
running a few examples before being timed out.

Reviewed By: tomdz

Differential Revision: D5349555

fbshipit-source-id: 89d1db063f58c72656b37157225a586c9e3f24bc
2017-06-29 12:48:01 -07:00
6df23b418d mark tools as excluded in find_packages (#1915) 2017-06-29 13:49:56 -04:00
a4cc9f2fbf Per-workspace mutex for shared im2col buffer
Summary:
Shared im2col buffer needs a mutex only to protect it from ops within a
workspace (since the shared buffer is created per workspace). The current
implementation has a global mutex which affects perf when running multiple nets
in parallel.

I don't feel great about adding a mutex for this in workspace, let me know if
anyone has better suggestions.

Reviewed By: akyrola

Differential Revision: D5341476

fbshipit-source-id: 1c9a92ef488ffb0c0013a7656bcb3d530bc7208b
2017-06-29 10:19:37 -07:00
e5b5154768 Make cudnn warnings clean. (#1940)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-29 10:58:04 -04:00
a0bfda2390 Fix issues to access Scuba and move out scuba logging from opensource to fb-internal codebase.
Summary: As title

Reviewed By: akyrola

Differential Revision: D5335465

fbshipit-source-id: c005210b9adfe553aee88da546da451ae29a52a6
2017-06-29 01:52:54 -07:00
bfaddc0a19 Warp intrinsic fixes (#785) 2017-06-29 00:14:07 -04:00
4d5075add2 Add ignore_index to nnl_loss and cross_entropy (#1937) 2017-06-29 00:10:13 -04:00
86305ddd49 Deprecate CNNModelHelper in python/seq2seq/seq2seq_model_helper.py
Summary: Also added some simple tests for Seq2SeqModelHelper.

Reviewed By: jamesr66a

Differential Revision: D5291733

fbshipit-source-id: 15866dccb89acd82c08e0348f14834cd9c201422
2017-06-28 20:18:12 -07:00
fb4c0a664b brew API in lstm benchamrk
Summary: I deprecated CNN ModelHelper in LSTM benchmark

Reviewed By: salexspb

Differential Revision: D5342734

fbshipit-source-id: 81a552194bcb0cc3071604340fce6873230964f2
2017-06-28 20:18:12 -07:00
a60f90a3a3 Only notify on DAGNet condition variable if the condition is actually true
Summary: This is splitting out one change from D5273337. This makes it so that we only notify the DAGNet condition variable if the condition it's signalling is actually true, namely remaining_ops_==0 || !success_.

Reviewed By: akyrola

Differential Revision: D5341962

fbshipit-source-id: a4d76cc95aebac27dc18da2bf8dc1837db69e6ae
2017-06-28 20:02:59 -07:00
e128245e8c Move memonger graph equality into memonger
Summary: Lets try this again. Verify graphs every time memonger is run. Will definitely check for time though.

Reviewed By: akyrola

Differential Revision: D5308188

fbshipit-source-id: 512a76c759b670d31c49d1d492dd8ee1eaf3bafd
2017-06-28 17:36:40 -07:00
0a95613cef Improve error message when accessing attributes that don't exist (#1936)
New:
   >>> torch.autograd.Variable(torch.randn(3, 3)).foobar
   AttributeError: 'Variable' object has no attribute 'foobar'

Old:
   >>> torch.autograd.Variable(torch.randn(3, 3)).foobar
   AttributeError: foobar
2017-06-28 20:13:15 -04:00
cb04548577 Clean up nvcc compiler warnings in utility_ops.cu
Summary: These `template` qualifiers are unnecessary and throw warnings

Reviewed By: akyrola, wickedfoo

Differential Revision: D5327333

fbshipit-source-id: 35bccbf410beb9311f3776747267c06f5ed7b620
2017-06-28 17:03:42 -07:00
8a4eb50ed1 Speed up torch.matmul for 3D+ x 2D/1D tensors (#1931)
If the left tensor is 3D+ and the right tensor is at most 2D, we can
fold the batch into the matrix dimension and use torch.mm instead of
torch.bmm. In practice, this is faster especially if the right tensor is
column major.
2017-06-28 17:43:21 -04:00
fe9b0bfd27 Fix some typos
Summary: Closes https://github.com/caffe2/caffe2/pull/882

Differential Revision: D5341277

Pulled By: harouwu

fbshipit-source-id: bb5595c65c05ca7ea1a1d060d61d14fbfe008241
2017-06-28 13:50:48 -07:00
ea659b8f2e broadcast to global parameters when using warmup
Reviewed By: asaadaldien, jay-mahadeokar

Differential Revision: D5340692

fbshipit-source-id: 80879847ff71c8d620de502ef95a9ffb4bdf595d
2017-06-28 13:35:27 -07:00
f3c15091c9 don't try to attach observer if net creation fails + unit
Summary:
As title. Not sure how did the unit test bug went through -
we should have push blocking test guarding it. Looks like sandcastle
thought that it was already broken

Reviewed By: jamesr66a

Differential Revision: D5340741

fbshipit-source-id: 76b2287fc2f746d85dd732b669ff89808bcbd497
2017-06-28 13:35:26 -07:00
87ab40c617 Tests: fixing observer_test
Summary:
Fixes bug reported at 3559ff93f9 (commitcomment-22813961). Builds have been broken since that commit was merged.

/cc salexspb
Closes https://github.com/caffe2/caffe2/pull/886

Differential Revision: D5340752

Pulled By: salexspb

fbshipit-source-id: 0d3f4cd0a66580ba173378a879a44bb1dbaf7e39
2017-06-28 13:17:13 -07:00
fbe2526343 Allow concurrent execution of GLOO broadcast collectives in
Summary:
This add CollectivesConcurrencyControl class to mange creating common context and cyclic controls to execute GLOO collectivces
and refactors AllReduce and _AddDistributedParamterSync to use it

Reviewed By: akyrola

Differential Revision: D5335795

fbshipit-source-id: 5084e0a65cdb989cd949be3868b77a680561022d
2017-06-28 12:49:12 -07:00
e2bd3cfc8b Add __sub__ function for schema.Struct
Summary:
This is for the ease of removing the common fields of a struct from another.
For example,
  s1 = Struct(
      ('a', Scalar()),
      ('b', Scalar()),
  )
  s2 = Struct(('a', Scalar()))
  s1 - s2 == Struct(('b', Scalar()))

More examples are provided in the code comments.

Differential Revision: D5299277

fbshipit-source-id: 7008586ffdc8e24e1eccc8757da70330c4d90370
2017-06-28 11:24:01 -07:00
b5e1df046e fixed typo in formula of GRU in doc (#1921) 2017-06-28 11:02:06 -04:00
08648061f7 Advanced Indexing 2A - Colons + Adjacent Adv Indexers (#1890) 2017-06-28 10:01:45 -04:00
8260002941 Partial eval layers
Summary:
In some cases we don't want to compute the full FC during eval.
These layers allow us to compute dot product between
X and W[idx,:] where idx is an input, e.g., label.

Reviewed By: kittipatv

Differential Revision: D5305364

fbshipit-source-id: 0b6a1b61cc8fcb26c8def8bcd037a4a35d223078
2017-06-28 00:36:40 -07:00
a1fcbb8be1 offline_all_gpu_experiment
Summary:
similar to sparse_nn all gpu, this is our first step towards offline full gpu experiment.

**Compare Run**
cat(128, 32)512-512 :
GPU 21138598 https://fburl.com/jpeod1pi
CPU 21138787 https://fburl.com/vma7225l

Reviewed By: dzhulgakov

Differential Revision: D5308789

fbshipit-source-id: 413819bf9c5fff125d6967ed48faa5c7b3d6fa85
2017-06-27 23:09:54 -07:00
1fce3eac4e single trainer hybrid device
Summary:
First try of single trainer hybrid device training for sparsenn

Comparison results with CPU training:
https://our.intern.facebook.com/intern/fblearner/run/compare/?compare_to[0]=20016969&compare_to[1]=19660293&baseline_run=19660293&all_runs[0]=20016969&all_runs[1]=19660293

Reviewed By: dzhulgakov

Differential Revision: D5205723

fbshipit-source-id: 4a024324ac2efc3248dd470d4c533cf2ecec2e92
2017-06-27 22:06:30 -07:00
9a14c013c3 Refactor data_parallel_model to take advantage of Gloo broadcast op in broadcasting across machines and GPUs in one operation
Summary: Combine _AddDistributedParameterSync() and _SyncParams() into a single function to broadcast across distributes machines and all local GPU simultaneously. This is similar to how calls to Allreduce has already optimized using the functionalities of Gloo. All the refactoring work is contained in data_parallel_model.py.

Reviewed By: akyrola, andrewwdye

Differential Revision: D5329277

fbshipit-source-id: 4407b88980cf396f2e0f994d796294fa79fd39ed
2017-06-27 19:35:24 -07:00
c3b4d277bf Tests: fix test_convolution_sync()
Summary:
This bug in the test was exposed by https://github.com/caffe2/caffe2/pull/861 (previously, the test was always using the cuDNN engine, regardless of the value of `engine`). This bug is now blocking https://github.com/caffe2/caffe2/pull/817.
```
____________________ TestConvolution.test_convolution_sync _____________________
...
            if use_cudnn and requested_engine != 'CUDNN':
                raise ValueError(
>                   'When use_cudnn=True, the only engine you can specify is '
E                   ValueError: When use_cudnn=True, the only engine you can specify is "CUDNN"
```
https://travis-ci.org/caffe2/caffe2/jobs/247605579
Closes https://github.com/caffe2/caffe2/pull/881

Differential Revision: D5332619

Pulled By: akyrola

fbshipit-source-id: 63737768a155359ddbbef1da424fcbb94f86bd4e
2017-06-27 18:07:04 -07:00
75fc49833f An observer for every created net and op
Reviewed By: akyrola

Differential Revision: D5319289

fbshipit-source-id: 1140caef6d608ab3e37d22311e5c8a7e489470d5
2017-06-27 18:07:03 -07:00
08cfc72dee Increase threshold for test_unroll_attention
Summary: To 0.000001.

Reviewed By: salexspb

Differential Revision: D5323697

fbshipit-source-id: 5a06c8f5e719b5252e4229704205be37777a8bab
2017-06-27 17:17:32 -07:00
07ba98b4b2 Allow specification of SliceOp dimensions via argument rather than via tensor
Summary: This should make it so we no longer have super hacky DAG chains just to generate vectors of indices that could be specified at model creation time

Reviewed By: akyrola

Differential Revision: D5316707

fbshipit-source-id: 97bb3868b69e0c5a7f465c95f2e16ae0485dcc56
2017-06-27 17:17:32 -07:00
4c35c630ec Enable norm gradgradchecks by lowering precision requirements. 2017-06-27 18:44:14 -04:00
3744efeaf8 Fix double backwards for prod. 2017-06-27 18:44:14 -04:00
bc032be13e Implement negative dimensions and double backwards cumprod. 2017-06-27 18:44:14 -04:00
ee1f21a53e fix perf bug in TransposeOp for CUDA
Summary:
It was allocating TensorCPU always, so causing mutex to be acquired in PinnedCPUAllocator.
Not much impact as everyone should use the CUDNN transpose, but good to fix anyway.

Reviewed By: jamesr66a

Differential Revision: D5332858

fbshipit-source-id: 287643df623b7cd59ab1028ed8b2ed1d3c1da44e
2017-06-27 15:27:28 -07:00
81f539a283 Implement SliceGradientOp for CUDA
Summary: Implement the gradient for the Slice op on GPU

Reviewed By: akyrola

Differential Revision: D5313442

fbshipit-source-id: 722ad0bdf65e014d3236e17d15c83d40d7c975d2
2017-06-27 14:49:22 -07:00
4d16578284 fix + verification for inplace blobs
Summary:
Fixes a memonger bug where it could recycle a blob that was released by the same op being processed.
Added a verification step to ensure in-place assignments are not changed.

Reviewed By: asaadaldien

Differential Revision: D5331495

fbshipit-source-id: 20b08f6de5b973e8c9868aa048c142cac1eb6c58
2017-06-27 13:51:03 -07:00
6057fa9193 Avoid compiler warning in operator_schema.h
Summary:
Previous attempt included terminating the program which is
not good. Here I am using [[noreturn]] trick.

Reviewed By: jamesr66a

Differential Revision: D5313159

fbshipit-source-id: 8889efcf793d44d472502309992e6f5b0a31f0e6
2017-06-27 13:31:35 -07:00
f814a892cf done re-seed cuda device if in bad fork (#1923) 2017-06-27 13:24:52 -04:00
dfd745a4d1 Conv frontend: checking engine and use_cudnn
Summary:
*Fixes https://github.com/caffe2/caffe2/issues/860*

Raise an exception when the user specifies conflicting values for `engine` and `use_cudnn` in the conv frontend.
Closes https://github.com/caffe2/caffe2/pull/861

Differential Revision: D5329587

Pulled By: akyrola

fbshipit-source-id: 0f1ced9a88c9c6c5a7cb30a070e5bf60129082f0
2017-06-27 09:47:48 -07:00
d45f722e43 data_parallel_model: NCCLBroadcast root fix
Summary:
The root is the root _rank_ and not the root _device_. Thus we always
use root=0, regardless of the devices used.

https://github.com/NVIDIA/nccl/blob/v1.3.0-1/src/broadcast.cu#L75

/cc slayton58
Closes https://github.com/caffe2/caffe2/pull/872

Differential Revision: D5329564

Pulled By: akyrola

fbshipit-source-id: 5a34be30c1a0046a74f28437cb08333c1fb46098
2017-06-27 09:47:48 -07:00
ca2bf16009 Tests: handle missing python-lmdb gracefully
Summary:
Fix issue mentioned here: 875a9850c1 (commitcomment-22773221)

Unblocks https://github.com/caffe2/caffe2/pull/817

/cc tomdz
Closes https://github.com/caffe2/caffe2/pull/871

Differential Revision: D5329573

Pulled By: akyrola

fbshipit-source-id: 855294f76bce82dce6d4bd489244922799848076
2017-06-27 09:47:46 -07:00
d592e188f7 port of ConcatDataset (#1902) 2017-06-27 12:31:56 -04:00
c0445c4426 support_multi_label
Summary: Extend image_input_op to support multi-label binary label vector

Reviewed By: panshen1

Differential Revision: D5318119

fbshipit-source-id: da6757ed9a562f1ab58e3ae5642b7a70d6d499c1
2017-06-27 08:47:59 -07:00
ae61f3ff42 adds poisson NLL loss (#1779) 2017-06-27 10:04:54 -04:00
1f391a42f7 fix warnings for docs generation 2017-06-27 00:18:32 -04:00
24e30534ea Implement SliceGradientOp for CPU
Summary: Implement slice gradient for CPU. Will soon port this over to GPU so NMT can use it

Reviewed By: akyrola

Differential Revision: D5309305

fbshipit-source-id: 8fb5f4e665f236ecce9227c5c0c302f5076b01ad
2017-06-26 21:18:05 -07:00
b933423495 support more than 8 gpus (#774) 2017-06-26 16:49:14 -04:00
ee1b7b50b3 fix docs for broadcast warning 2017-06-26 14:50:57 -04:00
cb5af39c69 Vectorize CPU ClipOp implementation (and add test)
Summary: Noticed this wasn't vectorized, could be handy.

Reviewed By: kennyhorror

Differential Revision: D5308593

fbshipit-source-id: c2b35ece34831f0546f010a1ebe0b89f1a7d9446
2017-06-26 11:33:13 -07:00
7cdd018db4 Fix assertEquals for lists and tuples (#1913)
zip finishes once the first iterator is exhausted, so we were erroneously allowing things like assertEquals([1, 2], [1]) to pass.
2017-06-26 14:13:21 -04:00
4862c0f47f Memonger in O(blobs)
Summary:
Made them faster.

This should be equivalent to the algorithm akyrola suggested, just with a list (of parents) as an intermediate representation instead of a string.

Reviewed By: akyrola

Differential Revision: D5308133

fbshipit-source-id: c976a513d10e79c157ea803afb99b147e9ea3357
2017-06-26 11:04:13 -07:00
82c38daa85 added research award info
Summary: Closes https://github.com/caffe2/caffe2/pull/863

Differential Revision: D5320787

Pulled By: akyrola

fbshipit-source-id: f59e874fafeba8879b1cf638be31fc54aa967cbb
2017-06-26 10:18:19 -07:00
87275817a4 fix a rare race condition by initializing scratch blobs beforehand
Summary: Data workers test timeouts randomly (very seldom), and looks like the reason is that we call FeedBlob in a thread (eneuque-thread), and first time that is called, it will call workspace.CreateBlob() -- which is not thread safe. Fix this by initializing the scratch blobs explicitly.

Reviewed By: panshen1

Differential Revision: D5292426

fbshipit-source-id: d7dad68f3ccc636c60bd82b2527f00f20da298b5
2017-06-26 10:18:18 -07:00
553e4ec20d Refactor conv_test - no cuDNN+dilation+NHWC
Summary:
Place all the cuDNN version checks in a helper function. Easier to use
in future tests and update for newer versions of cuDNN in one place.

Fixes this error in `test_convolution_gradients`:
```
RuntimeError: [enforce fail at conv_op_cudnn.cc:519] status == CUDNN_STATUS_SUCCESS. 9 vs 0. , Error at: /data/caffe2/caffe2/operators/conv_op_cudnn.cc:519: CUDNN_STATUS_NOT_SUPPORTED Error from operator:
input: "X" input: "w" output: "Y" name: "" type: "Conv" arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NHWC" } arg { name: "dilation" i: 2 } arg { name: "kernel" i: 1 } device_option { device_type: 1 } engine: "CUDNN"
```
Closes https://github.com/caffe2/caffe2/pull/839

Reviewed By: salexspb

Differential Revision: D5292123

Pulled By: akyrola

fbshipit-source-id: 513cc742be73c29ffe24e9e964845a217405a73d
2017-06-26 09:20:07 -07:00
7806a09f03 Fp16 fixes for CUDA 9 (#783) 2017-06-26 11:38:18 -04:00
7523c49f03 add missing INCREF 2017-06-26 11:33:16 -04:00
733a7c6d9a Fix segfault in SpatialDepthWiseConvolution w/o bias 2017-06-26 16:33:45 +02:00
c8410859d9 Operator python stacktraces, attempt 2
Summary:
Last time I used uuid filled into OperatorDef. And operator_tracebacks was populated using traceback.extract_stack. There were several issues with this approach:

1. A random field in OperatorDef breaks workflows relying on memoization, i.e. when computation is skipped based on already computed result before.
2. Adding one more field revealed RNNs being non forward compatible wrt to new fields in there. prototxt format seems to not allow forward compatibility (thanks jamesr66a for the investigation!). For RNNs we need to swtich them to a more resilient approach. azzolini's proposed change to OperatorDef / NetDef would allow that by just nesting NetDef dirrectly inside OperatorDef without need for extra serialization.
3. traceback.extract_stack is very slow when executable is on a remote filesystem. It does one or more os.stat for each frame on the stack. For some cases it ended up being up to 15 extra minutes on model construction.

In this diff I use a different approach which should fix all those problems above.

1.2. are solved by not adding a new field at all. Instead I report operator idx wrt to a net it runs in. Thanks akyrola and dzhulgakov for the idea. Downside here is that operator list manipulation breaks the logic and separately created ops are not covered at all.
3. I solved this by operating on raw frames without using traceback and inspect modules which end up doing a lot of file system calls. See function extract_stacktace in core.py with additional comments.

Reviewed By: dzhulgakov

Differential Revision: D5286285

fbshipit-source-id: 626dd0f5f6b8b1d86bd6bf519078b122f43ddcaa
2017-06-25 19:32:58 -07:00
29887f556f Unrolled test for AttentionCell
Summary: Adding a test to check computational integrity of networks constructed with AttentionCell using UnrolledCell.

Reviewed By: salexspb

Differential Revision: D5306915

fbshipit-source-id: 02acfd1011f7d3ee5fac21cc2778c4a486190c43
2017-06-25 17:21:24 -07:00
32e666551a Fix lint. 2017-06-24 09:45:21 -04:00
ab0c321f80 Fix index_copy gradgrad test by ensuring indices cannot be repeated. 2017-06-24 09:45:21 -04:00
9db14936eb Ensure masked_select tests don't have masks of all zeros which yields
0-dimensional tensors.
2017-06-24 09:45:21 -04:00
e5857c5f1c Implement Gather double backwards. 2017-06-24 09:45:21 -04:00
7da77c4255 Add ScatterAdd autograd function. 2017-06-24 09:45:21 -04:00
656cb1c31a Implement and test double backwards for IndexCopy. 2017-06-24 09:45:21 -04:00
4ab4938cf0 Fix and test single backwards IndexCopy. 2017-06-24 09:45:21 -04:00
1324c4b081 Implement double backwards for masked_scatter. 2017-06-24 09:45:21 -04:00
bb3779efe8 Add broadcasting to masked_select. 2017-06-24 09:45:21 -04:00
0394bc2b40 fix clip_op bug
Summary: ajtulloch caught me. This fixes the min() bug - should be lowest().

Reviewed By: xianjiec, dzhulgakov

Differential Revision: D5316406

fbshipit-source-id: 76c13e8eddc4233b40f99a801910fbf7a1ef6b28
2017-06-23 22:31:54 -07:00
7c24a3d5cf fix arguments for cudnnFindEx for transposed wgrad 2017-06-23 23:18:32 -04:00
04c9c8c5c2 fix for loading model with bmuf
Summary: - One line fix for loading saved checkpoint when using Parallelize_GPU_BMUF

Reviewed By: asaadaldien

Differential Revision: D5315254

fbshipit-source-id: a20ba6438c8e6b2ef44b65270c1d3f9ab645ded0
2017-06-23 17:16:33 -07:00
5cb73c106e TravisCI: fix OSX builds
Summary:
Apparently, `brew install` fails if the package is already installed?
```
Error: automake 1.15 is already installed
```
https://travis-ci.org/caffe2/caffe2/jobs/245226634

Maybe TravisCI made some unannounced updates to their OSX images at around the same time [they updated their trusty images](https://blog.travis-ci.com/2017-06-21-trusty-updates-2017-Q2-launch). Something changed on their side two days ago, and the OSX builds have been failing ever since.
Closes https://github.com/caffe2/caffe2/pull/858

Differential Revision: D5313447

Pulled By: aaronmarkham

fbshipit-source-id: 7085640704c60c0119a1a75ea69dacd64b5a4da8
2017-06-23 17:08:54 -07:00
194bc404b5 CUDA 9
Summary:
Adds basic CUDA 9 support, including adding Volta arch, and making appropriate modifications for half precision datatype changes
Closes https://github.com/facebookincubator/gloo/pull/49

Differential Revision: D5315336

Pulled By: pietern

fbshipit-source-id: 6468b0f357206d604bdcfec69ba82509a2c91407
2017-06-23 16:41:27 -07:00
fd86c51c39 Add ResizeNearest
Summary: Added the CUDA implementation of ResizeNearest (forward pass only)

Reviewed By: wickedfoo

Differential Revision: D5290087

fbshipit-source-id: 4291e65b2b4b6a1a197275d5ed8710f40000b59e
2017-06-23 15:49:42 -07:00
8f1e641d5f Deprecate CNNModelHelper in python/data_workers_test.py
Summary: Deprecate CNNModelHelper in python/data_workers_test.py

Reviewed By: harouwu

Differential Revision: D5312089

fbshipit-source-id: 37b72ac2031acf14a7e6a6ea0a298b71b00b10dd
2017-06-23 14:46:58 -07:00
ca2b608f83 Fixed typo
Summary:
peaces -> pieces, peace -> piece
Closes https://github.com/caffe2/caffe2/pull/819

Differential Revision: D5312417

Pulled By: aaronmarkham

fbshipit-source-id: 59d2c3f475197a5f29dc7cf3ecaf675a242d3cdf
2017-06-23 14:02:40 -07:00
342de07231 Core unit test fixes for Python 3
Summary: As title

Differential Revision: D5291327

fbshipit-source-id: 7dd9279c53ba55d3422c31973ffcec5705787fdf
2017-06-23 13:22:16 -07:00
a9ea975977 enable warnings in build and fix warnings 2017-06-23 11:49:09 -07:00
ff914bf201 Move away from creating netdef from string, which may get deprecated
Summary: Remove cases of constructing a NetDef from String, instead of just creating a NetDef.

Reviewed By: salexspb

Differential Revision: D5309645

fbshipit-source-id: 06ec8617733d9dc5385668485f3b091bb37b3f73
2017-06-23 11:36:23 -07:00
ccc46229af Fix residual connections
Summary:
This diff fixes gradient computation of residual connections for a training network constructed with MultiRNNCell.

It addresses a logic bug in _prepare_output() and _prepare_output_sequence() by keeping track internally of which layers have consecutive residual connections before the output, and then reconstructing the final residual output by (re-)preparing the output of each of those layers and then combining them with a Sum operation. This also involves keeping track of which states contribute toward the reconstruction of the final sequence output so that outputs_with_grads can be correctly passed to apply_over_sequence().

Differential Revision: D5300520

fbshipit-source-id: f37d800c909e631175de7045abe192351cc11c41
2017-06-23 11:36:22 -07:00
b1a84e3c70 update readme and add assign_(Scalar) variant 2017-06-23 11:27:55 -07:00
8a24f2b4d8 Fix segfault in SpatialDepthWiseConvolution w/o bias 2017-06-23 11:14:00 +02:00
66d93b60b3 fix a bug with scalar handling by simplifiying the maybeScalar check. 2017-06-22 23:07:56 -07:00
2af6ba3b2a handle select and operator[] style operations 2017-06-22 22:57:43 -07:00
b59b44fac7 add checks for scalars on output 2017-06-22 21:46:04 -07:00
a10a1c92b1 start adding rules to propagate scalar to results 2017-06-22 20:51:02 -07:00
bb6908e163 Scalar objects can now be backed by 0-dim Tensors. 2017-06-22 18:57:09 -07:00
0c19074c56 LOG(WARNING) when an operator fails feature checks
Summary: We had a latent cudnn operator instantiation failure that we didn't know about until I looked at the nvvp profile. This makes it so that those failures (i.e. OPERATOR_NEEDS_FEATURE failures) print to LOG(WARNING) instead of VLOG(1)

Reviewed By: salexspb

Differential Revision: D5303012

fbshipit-source-id: bda54682d9932f907e44aa1c81a04521d864ae99
2017-06-22 18:33:44 -07:00
c555cd8253 missing fixed allocator files 2017-06-22 18:32:10 -07:00
5e078bb7cc scalar flags added, and used to dispatch when there is a scalar variant of a function. broadcast annotations are used to figure out when a scalar s + A should also be converted. 2017-06-22 17:22:16 -07:00
2c73ae507a Allow assertValidationChecks to take init_net
Summary: This is needed so that we can create blobs that are not numpy arrays, e.g., creating mutex with `CreateMutex` op.

Reviewed By: chocjy

Differential Revision: D5303742

fbshipit-source-id: f83cbf67c658a234c1e4a9a114ad943a4e360598
2017-06-22 16:02:43 -07:00
209c570f0d Deprecate CNNModelHelper in caffe2/python/model_device_test.py
Summary: Deprecate CNNModelHelper in caffe2/python/model_device_test.py

Reviewed By: harouwu

Differential Revision: D5299367

fbshipit-source-id: 5ab53b877b7c0f1a1c4daf2338d5024b2d2d9261
2017-06-22 15:37:17 -07:00
ee10e7457f Corrected erroneous docstring for MultiLabelSoftMarginLoss 2017-06-22 17:42:18 -04:00
5ca263fb1c Add a warmup option for BMUF
Reviewed By: yqwangustc

Differential Revision: D5279655

fbshipit-source-id: 7c778a88909580bbe43d4bac4b7d73be0d0e3f27
2017-06-22 14:32:39 -07:00
7cd6cc17af Merge commit '93e05eb458ad4c939e905668c1792692315880b0' 2017-06-22 17:23:02 -04:00
8bfef60b07 Merge commit '32fd4a3d6081a13c18ce4f8dcb37260a830a911f' 2017-06-22 17:22:31 -04:00
a45ad7cfba Advanced Indexing Part 1 -- Purely Integer Array Indexing 2017-06-22 17:21:50 -04:00
93e05eb458 Advanced Indexing Part 1 -- Purely Integer Array Indexing 2017-06-22 17:21:30 -04:00
32fd4a3d60 Advanced Indexing Part 1 -- Purely Integer Array Indexing 2017-06-22 17:21:19 -04:00
667b8347a2 stabilize softmax_ops_test
Summary: softmax_ops_test occasionally fails with gradient checks. Stabilize by setting the numpy random seed. Also reduce some dimensions for the large input test to make it run faster.

Reviewed By: harouwu

Differential Revision: D5292106

fbshipit-source-id: a21eec89e18d30ac7c5609dacf5d413e841841a6
2017-06-22 13:50:32 -07:00
f09027bc29 Add batch sampler to DataLoader (#1867) 2017-06-22 20:18:31 +02:00
9a196829e2 Merge commit '43dec0a210103c4421bc73c7e742f0f746b7e39e' 2017-06-22 13:55:54 -04:00
43dec0a210 Remove THCTensor_(expand2) and THCTensor_(expand3).
They are no longer needed and the corresponding TH versions have been removed.
2017-06-22 13:55:08 -04:00
064ef8b81b Merge commit '104234a6a8937f09208061975ce90190a7be4159' 2017-06-22 13:21:59 -04:00
662faf7c41 Merge commit 'a940d4ff8bf5debc76d909a778e2e47d24148ee1' 2017-06-22 13:21:38 -04:00
cph
104234a6a8 add asserts to BCECriterion 2017-06-22 13:20:25 -04:00
cph
a940d4ff8b add asserts to BCECriterion 2017-06-22 13:20:07 -04:00
c16a268f47 Merge commit 'fb32164a72004e63ebfe1f9ca8366ff12f8fbec2' 2017-06-22 12:56:36 -04:00
cb4eaa9c5d TensorLib/Aten --> changes required in pytorch 2017-06-22 12:55:55 -04:00
fb32164a72 TensorLib/Aten --> changes required in pytorch 2017-06-22 12:55:17 -04:00
b5854a11c4 Merge commit 'eccc759c36a4023357c87fde79732e4c916676d2' 2017-06-22 12:49:50 -04:00
ddbd4ef4ac Support out-of-place broadcast type definitions. 2017-06-22 12:49:06 -04:00
eccc759c36 Support out-of-place broadcast type definitions. 2017-06-22 12:48:43 -04:00
fecd05ba2f Merge commit '81e14ad2dee356b2c2274eb302bc2438c9a6161a' 2017-06-22 12:46:37 -04:00
a7d1cd75ec Merge commit '93a7c9de29900f166486373744a0e90c7046a56a' 2017-06-22 12:46:02 -04:00
497db732fc btrifact: Make pivoting optional. 2017-06-22 12:45:14 -04:00
81e14ad2de btrifact: Make pivoting optional. 2017-06-22 12:45:01 -04:00
93a7c9de29 btrifact: Make pivoting optional. 2017-06-22 12:44:51 -04:00
96febbb762 Merge commit '62cfc94f445bfaeaccc3dcc1fc69ea5b75039823' 2017-06-22 12:40:40 -04:00
62cfc94f44 improving TH error messages in Apply macros 2017-06-22 12:38:10 -04:00
3f6cda8696 fix bug of threshold activation 2017-06-22 12:23:35 -04:00
a836f8f56f Use and document saved_variables for double backwards. 2017-06-22 11:46:24 -04:00
278cbbae49 set TH_INDEX_BASE to 0 2017-06-21 16:43:16 -07:00
ffd32c8ab7 Add distributed BMUF implementation.
Summary:
Refactor data_parallel_model all_reduce and broadcast methods to work for
a given parameter set not only gradients and reuse them for BMUF distributed
implementation.
Add a distributed test (multiprocessing) to BMUF.

Reviewed By: akyrola

Differential Revision: D5267083

fbshipit-source-id: 8dcc7527d0a755b903d693d8071585f0b54d3403
2017-06-21 16:18:11 -07:00
68cbb857f2 allow tensors to be constucted from views of external data. Support creating new tensors that already have a size/stride 2017-06-21 15:35:08 -07:00
cf4ac83a91 Make List.__getitem__() works with output of List.field_names()
Summary:
As described in T19378176 by kittipatv, in this diff, we fix the issue of __getitem__() of schema.List.

For example, given Map(int32, float) (Map is a special List), field_names() will return "lengths", "values:keys", & "values:values". "values:keys" and "values:values" are not accessible via __getitem__(). __getitem__() bypasses the values prefix and directly access the fields in the map. Other APIs (e.g., _SchemaNode & dataset_ops) expect "values:keys" and "values:values" as it simplifies traversal logic. Therefore, we should keep field_names() as is and fix __getitem__().

Reviewed By: kittipatv

Differential Revision: D5251657

fbshipit-source-id: 1acfb8d6e53e286eb866cf5ddab01d2dce97e1d2
2017-06-21 14:06:05 -07:00
f937e4bffb Revert D5288993: Memonger Graph Equality into Memonger
Summary: This reverts commit b9f105ce00148b2673eed2dd390ab74f82f990ad

Differential Revision: D5288993

fbshipit-source-id: 8f2e69c0ca21e142eb43b450d0b52ba76a5e429f
2017-06-21 13:45:50 -07:00
97ca7d7e6f Remove unused thrust headers from math_gpu.
Reviewed By: wickedfoo

Differential Revision: D5290296

fbshipit-source-id: 576208c49001b236b7ebda7c11b2f0e498da9ea4
2017-06-21 12:34:16 -07:00
a1c557bc45 improve error reporting for undefined tensors passed as arguments. 2017-06-21 12:24:59 -07:00
8464ec5c3a Fixed a bug in compute_interference_graph() when using with multiple in-place operators.
Summary:
compute_interference_graph() was not able to handle the case when a blob is reused twice for operators supporting in-place parameters. For example, for the following network with operators Mul and Sub

(blob) -> [Mul] -> (blob) -> [Sub] -> (blob)

an incorrect edge will be added from [Sub] to [Mul] and causes nx.is_directed_acyclic_graph() to fail.

Reviewed By: ajtulloch

Differential Revision: D5271604

fbshipit-source-id: f6095b6f8e1dba556ba223a82c8170be7f744529
2017-06-21 12:01:37 -07:00
4c5b7d41ba tensor.data<> also as toLongData() variants. Scalar now also has .to<T>() variants 2017-06-21 11:57:37 -07:00
a531d74dde ELU CUDA implementation
Reviewed By: wickedfoo

Differential Revision: D5290111

fbshipit-source-id: 95bd0b5467fe064f2fe1b21cb8ec31f150b35e3f
2017-06-21 11:47:13 -07:00
13e7648fd1 document accessors 2017-06-21 11:23:03 -07:00
4be5337cca add support for weight in batch_softmax_loss
Summary: weighted batch_softmax_loss when weight exists in input_record

Reviewed By: kittipatv

Differential Revision: D5291646

fbshipit-source-id: f1bcd386ad1fc0e95e0a0315ec1c36531c792495
2017-06-21 10:32:15 -07:00
f222e226b4 Memonger Graph Equality into Memonger
Summary: Make verify_graph_equality get called by share_grad_blobs and optimize_inference_for_dag

Reviewed By: akyrola

Differential Revision: D5288993

fbshipit-source-id: b9f105ce00148b2673eed2dd390ab74f82f990ad
2017-06-21 10:09:15 -07:00
249614ca88 Fix CMake messages when CUDA not present
Summary: Closes https://github.com/caffe2/caffe2/pull/767

Differential Revision: D5292618

Pulled By: akyrola

fbshipit-source-id: 22bcfe01244d6beb48c580c84c790c810dc06998
2017-06-21 08:47:46 -07:00
d46fe736c8 Fix flaky test in dataset_ops_test.py
Summary:
```
while pytest caffe2/python/operator_test/dataset_ops_test.py::TestDatasetOps::test_collect_tensor_ops; do sleep 0.1; done
```
Run this long enough and you'll see an error like this:
```
Sample histogram: [ 92 109  65 103  99 104  99 125 100 104]
...
>       self.assertTrue(all(hist > 0.7 * (num_to_collect / 10)))
E       AssertionError: False is not true
```
I've seen values like 65, 68, 69, 70. Setting the cutoff at 60 instead of 70 seems safe enough.

/cc Yangqing (or whoever authored a56b881c4a).
Closes https://github.com/caffe2/caffe2/pull/840

Differential Revision: D5292120

Pulled By: akyrola

fbshipit-source-id: 2ea4cbb58e206268759bd9d3639e8921623f519c
2017-06-21 05:35:44 -07:00
005156f6b4 Fix gradient checking for softplus op
Summary:
kmatzen why did you set the stepsize in ff84e7dea6?

The test is flaky before this change. Solid afterwards.
Closes https://github.com/caffe2/caffe2/pull/841

Differential Revision: D5292112

Pulled By: akyrola

fbshipit-source-id: c84715261194ff047606d4ec659b7f89dac3cbb1
2017-06-21 05:35:43 -07:00
e2107fffba Fixes for test_recurrent in hypothesis_test.py
Summary:
/cc akyrola is it possible this test has been broken ever since 5614816fce?

More generally, why do we still have `hypothesis_test.py` at all? In the case of this test, surely one of these files does more than this one old test:

* `operator_test/cudnn_recurrent_test.py`
* `operator_test/recurrent_network_test.py`
* `operator_test/rnn_cell_test.py`
Closes https://github.com/caffe2/caffe2/pull/843

Differential Revision: D5292109

Pulled By: akyrola

fbshipit-source-id: 6df5df6353a9741d1ae1b796adaab98382857527
2017-06-21 05:35:42 -07:00
c24dabb414 Enable runtime cloning of tasks.
Summary:
Funnily, the biggest issue when trying to increase number of trainers from 5 to 20 is not model convergence (it is worse but still converges without tuning); it is the initialization time: it took around 30 min to generate the job.

After this diff, job creation time for the standard 5-7 setup goes from 125s to 8s. (15x speedup).

Another improvement is that ##net_printer.to_string(job)## becomes less complex.

This makes the startup for 20 trainers go to 32s, which is still not ideal.

Next step will be to allow passing num_instances to Node as well. This way we'll be able to create only one reader and one trainer prototype and let the framework take care of the scheduling. For this one we will need to move some DataStream and PS initialization code to C++ first. (c.c. aartibasant)

Reviewed By: dzhulgakov

Differential Revision: D5100788

fbshipit-source-id: 7b76bce108f527a96b2bfe7ed43a22ea8679b682
2017-06-21 03:18:20 -07:00
f795bf0b2a Revert D5273337: [caffe2] Pare down on excessive futex() syscalls from the DAGNet executor
Summary: This reverts commit 67d50f9d838e9a9ef3682d9a3b5ba59c7d33350d

Differential Revision: D5273337

fbshipit-source-id: 85e2f3ef228871beed2afef569407474c8f8acb9
2017-06-21 01:48:24 -07:00
d9087edb07 add rekey in feature_processor
Differential Revision: D5270972

fbshipit-source-id: 8805c0e947f4752d2c575e2a7b8986cd804601dc
2017-06-20 23:19:09 -07:00
34eaa19d27 CPU data parallel model
Summary:
CPU -version of data parallel model. Great thing is that now we can run data_parallel_model_test in Sandcastle (as it does not have GPUs).

Pretty simple change, really. I did not change all variable names with "gpu" in them, to reduce risk (and being a bit lazy). Can improve later.

Reviewed By: wesolwsk

Differential Revision: D5277350

fbshipit-source-id: 682e0c5f9f4ce94a8f5bd089905b0f8268bd2210
2017-06-20 23:19:08 -07:00
7d482742fd Allow tasks/execution_steps to be cloned at runtime
Summary:
Advantages of cloning the tasks/execution_steps at runtime:
- Less complexity on the python side: no need to clone nets and add prefixes to blob names
- Faster start-up: we had cases of complex plans that took up to 30min to be created.
- Better isolation: each task cloned at runtime has its own child workspace, preventing false sharing of blobs.
- Opens up possibility for dynamic scheduling: Number of threads per task can be increased on the fly, at runtime.

Reviewed By: dzhulgakov

Differential Revision: D5100730

fbshipit-source-id: 71b83193b135da4e6eaf2536d8fc266528e1fdcc
2017-06-20 22:32:07 -07:00
1572173ca7 Implement double backwards for Sort, Topk. 2017-06-21 00:24:13 -04:00
e16ceef76a Implement Scatter double backwards. 2017-06-21 00:24:13 -04:00
b79ff11aca Implement IndexAdd, IndexFill, IndexSelect, MaskedSelect double backwards. 2017-06-21 00:24:13 -04:00
50c0912a75 Implemented masked_fill double backwards. 2017-06-21 00:24:13 -04:00
43afb1d4ca Make sure Elu alpha is strictly positive
Reviewed By: Yangqing

Differential Revision: D5289483

fbshipit-source-id: 96223304e4b1278595bae5ed137b5b80b7f8f521
2017-06-20 21:19:48 -07:00
c3ad55f746 add readme and generated files for Type/Tensor/Functions to a doc folder to make it possible to view headers without building the library 2017-06-20 20:33:26 -07:00
29f037f432 Improved Observers, based on NetBase now
Summary: Fixed a lot of issues that salexspb brought up, and templates on NetBase which basically adds compatibility for DAGNetBase. This will be useful for Fei's future work.

Reviewed By: salexspb

Differential Revision: D5272352

fbshipit-source-id: b5ffe1d6fb0566dc1bfad9041c129a3ab7f6d93a
2017-06-20 18:22:38 -07:00
4b93f32234 rename TensorLib -> ATen 2017-06-20 16:49:13 -07:00
5957218cf0 Adding Dropout Layer to SparseNN Model and Flow
Summary:
- Incorporated dropout layer to the sparseNN training and testing pipeline
- Integrated an advanced model options feature on Flow UI for users to specify dropout rate
- Created an end-to-end unit test to build and run a model with dropout

Reviewed By: chocjy

Differential Revision: D5273478

fbshipit-source-id: f7ae7bf4de1172b6e320f5933eaaebca3fd8749e
2017-06-20 15:46:55 -07:00
03f41c8120 fix capitalization of Python, make it consistent 2017-06-21 00:09:37 +02:00
dd1525d346 fix #790 so model.init_params = False takes effect
Summary:
Given the parameter init_params=False, Weight Blob(*_w) and Bias Blob (*_b) should be suppressed in model.param_init_net. Without this fix, the init_params=False doesn't take effect in brew.conv as it does in brew.fc or other ops. This issue is the root cause of #790 [https://github.com/caffe2/caffe2/pull/790].
Closes https://github.com/caffe2/caffe2/pull/824

Reviewed By: harouwu

Differential Revision: D5276676

Pulled By: akyrola

fbshipit-source-id: 8f7088a8e1976658f67e027223e555375b3a2392
2017-06-20 14:08:35 -07:00
673f1d9362 Fix packsegments op and text RNN models batchsize > 0
Summary: Fix PackSegments op warning when using pad_minf

Differential Revision: D5259897

fbshipit-source-id: 3c117c89d23c4ee1c67e5824b80a18bb52e16a07
2017-06-20 12:18:56 -07:00
5084ff3b9b improve blob sharing
Summary:
Since D5193393 introduced a "token" system for memonger that prevents sharing of blobs across parallel branches, we can be more aggressive in blob sharing. Thus, this removes the tracking of 'unused free blobs' and just relies on the token system.
For forward-only resnet50, this reduces the number of shared blobs to 5 (optimal accorsing to akirillov's calculation).

This requires careful testing, so I will not land it soon.

Reviewed By: asaadaldien

Differential Revision: D5208985

fbshipit-source-id: 2e520c4ea2351a2ec327b6c5f2e3af24234d1c9a
2017-06-20 12:08:57 -07:00
e0b70d0f64 Fix Fmod/Remainder gradgradcheck by ensuring inputs requires_grad. 2017-06-20 11:59:21 -04:00
0b2b7d0594 Kth value function passes gradgradcheck. 2017-06-20 11:59:21 -04:00
6d97ac0c0f Missing includes in cuda_collective_device.h
Summary: Closes https://github.com/facebookincubator/gloo/pull/47

Differential Revision: D5283752

Pulled By: pietern

fbshipit-source-id: 8ad3353b3455c5416e31e75b46755e2f7fcaad52
2017-06-20 08:54:16 -07:00
64bec43916 Fix a bug in BooleanUnmaskOp
Reviewed By: kennyhorror

Differential Revision: D5245130

fbshipit-source-id: 5ec602a33207c9d5de0f2d8d022fdcc540212586
2017-06-20 08:34:09 -07:00
a405efa756 CUDA collectives as alternative to NCCL
Summary:
Adds a separate set of CUDA collectives that run on device as an
alternative to NCCL. Use these collectives as default on-device
collectives instead of NCCL.

Whenever multiple processes on the same machine use Gloo with NCCL and
end up doing concurrent CUDA memory allocations and algorithm
execution, we risk deadlock. A follow up change will enable opt-in
usage of NCCL (e.g. through environment variable).

Benchmark output below with varying number of elements. It shows a
minor improvement over using NCCL for local reduction and broadcast.

Number of elements equal to on-device threshold (256K):

```
Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_ring
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before)  262144       2685       2907       3035       3215        562
(after)   262144       2682       2874       3013       3395        577

Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_ring_chunked
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before)  262144       2045       2133       2325       2643        725
(after)   262144       1533       1673       1834       2048        800

Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_halving_doubling
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before)  262144       1580       1640       1718       2069        893
(after)   262144       1371       1446       1539       1748       1125
```

Larger number of elements (4M):

```
Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_ring
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before) 4194304      55543      58058      60103      62659         32
(after)  4194304      54490      57923      60893      66058         33

Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_ring_chunked
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before) 4194304      18049      22820      24997      26634        105
(after)  4194304      18356      20463      21695      22589         99

Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_halving_doubling
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before) 4194304      18584      24345      27809      29722         95
(after)  4194304      19541      22718      25408      26688         88
```

Reviewed By: akyrola

Differential Revision: D5278192

fbshipit-source-id: 53f09e404663ddc8bb46d06ac87afd8ee3ffc3a2
2017-06-20 00:23:43 -07:00
5e084a9112 Don't require pydot for Python tests
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.
```
>       graph = pydot.Dot(name, rankdir=rankdir)
E       AttributeError: 'NoneType' object has no attribute 'Dot'
```
https://travis-ci.org/caffe2/caffe2/jobs/243867951
Closes https://github.com/caffe2/caffe2/pull/827

Differential Revision: D5276691

Pulled By: akyrola

fbshipit-source-id: 047ee869c029002ace29d84c6b56534b7f23f87b
2017-06-19 23:02:00 -07:00
a5c45e18b5 MaxGradientOp for CUDA + unit test
Summary: As title. Pretty straightforward. Could actually run each kernel in parallel, but we can optimize later if needed.

Reviewed By: Yangqing

Differential Revision: D5278415

fbshipit-source-id: 29f59afe28f37fc4152ec7eb7cd6c1ab65f2cb8c
2017-06-19 22:35:45 -07:00
611677702f Minor Fix in VideoInputOp
Summary: end_frm must be Less Than or equal to sampledFrames.size()

Reviewed By: dutran

Differential Revision: D5279265

fbshipit-source-id: 6bae714db6e07ff10ac01c95e6bead786d4941d2
2017-06-19 22:09:13 -07:00
67968cb60b Add numerically stable BCELoss which takes logits as input (#1792) 2017-06-19 22:05:51 -04:00
d2b1cb22a4 rekey layer
Differential Revision: D5210095

fbshipit-source-id: dc66a10d95842e0f10cb53a5afb7ddcc3fcac0de
2017-06-19 18:47:28 -07:00
a6c5e3f2e2 Fix case where interface doesn't have an address
Summary:
Code in tcp/transport tries to find the network interface a socket was
bound to when create a TCP device context. Per getifaddrs(3), it is
possible for the ifa_addr field to be NULL (supposedly when an
interface doesn't have an address). Ignore such entries.

Thanks to slayton58 for reporting this.

Reviewed By: wesolwsk

Differential Revision: D5279376

fbshipit-source-id: 039380b95ba4d6d94942c30581e0b230a060870c
2017-06-19 18:05:32 -07:00
6ee6b4980b multiple docs 2017-06-19 20:06:27 -04:00
83e6a0bec8 Revert uuid change to OperatorDef protobuf
Summary:
a few issues:

1. Randomization hurts memoization
1. Even if we make it non random, then we can get key colisions when loading it back.
2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is

I am thinking of a better less invasive solution now.

Reviewed By: jamesr66a

Differential Revision: D5272118

fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be
2017-06-19 16:47:31 -07:00
ceb13c8cc3 Don't propagate -mavx flag to dependents
Summary:
Previously, `gloo/math.h` inlined methods which use AVX builtins,
which required propagating the `-mavx` flag.
This diff moves these definitions out of the header and into a source
file to prevent avoid this.

Reviewed By: pixelb

Differential Revision: D5271043

fbshipit-source-id: dde4dc560dfb557b46d1a582a8b38e7cb8eb0c37
2017-06-19 16:46:43 -07:00
a6fcecaa71 Allow AliasOp to work on empty tensor
Summary: Ran into it while working on a dper benchmark. Apparently it works harmless even with empty tensors.

Reviewed By: akyrola

Differential Revision: D5273672

fbshipit-source-id: a968ae03a659d6c1a215f12cc35f7ba68448e833
2017-06-19 15:24:02 -07:00
82ef292f00 Add gradgradchecks for various autograd Functions and support Unfold double backwards. 2017-06-19 18:19:16 -04:00
76ee014d10 Add documentation to SELU and AlphaDropout 2017-06-19 18:18:01 -04:00
f619ac6ac9 Quickfix for AlphaDropout on CUDA 2017-06-19 18:18:01 -04:00
6150d9bef2 Building dropout as layer
Summary: Dropout layer and unittest for DPer2

Reviewed By: chocjy

Differential Revision: D5254866

fbshipit-source-id: 5eaea81808ddf8e0c7a7d76209ea44cda2ee28aa
2017-06-19 14:46:52 -07:00
956e40f0ea Pare down on excessive futex() syscalls from the DAGNet executor
Summary:
For our CNN training runs I noticed an excessive number of futex() syscalls. Using strace I narrowed this down to excessive calls to std::condition_variable member functions.

1) I added a PushBulk member function to SimpleQueue, that will push all items in a vector onto the queue and issue a single std::condition_variable::notify_all() call, rather than separate notify_one() calls per item.
2) In DAGNet::WorkerFunction, we were calling std::condition_variable::notify_one() after every single op chain was completed, even though it should have only been called when the number of remaining operators dropped to 0 or the execution failed. I added a conditional check around this call to further cut down on unnecessary syscalls.

Reviewed By: pietern

Differential Revision: D5273337

fbshipit-source-id: 67d50f9d838e9a9ef3682d9a3b5ba59c7d33350d
2017-06-19 14:19:39 -07:00
31e700910d Fix entropy error coming from test_div
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.

`E           InvalidArgument: Insufficient bytes of entropy to draw requested array.  shape=(4, 2, 5, 1, 3, 5, 5, 1), dtype=float32.  Can you reduce the size or dimensions of the array?  What about using a smaller dtype?  If slow test runs and minimisation are acceptable, you  could increase settings().buffer_size from 8192 to at least 24576000.`

https://travis-ci.org/caffe2/caffe2/jobs/243867951
Closes https://github.com/caffe2/caffe2/pull/828

Differential Revision: D5276723

Pulled By: akyrola

fbshipit-source-id: f7d0e2dd8ef8b6a2354bd4ff7c7446c377c954b4
2017-06-19 13:47:29 -07:00
969831ea33 Deprecate CNNModelHelper in lmdb_create_example
Reviewed By: akyrola

Differential Revision: D5233793

fbshipit-source-id: bae745791f071bc36fd45bd81145ce86c8ba9ed0
2017-06-19 13:04:02 -07:00
32e6372538 Split cuda_collectives.h into two files
Summary:
This changes prepares for having a separate set of collectives that
use native CUDA calls instead of NCCL. This is needed to workaround
the issue where NCCL deadlocks when it is interleaved with CUDA memory
management operations in other processes on the same machine.

Includes a modification to the host reduction functions to bring them
up to parity with the NCCL reduction functions (they now incorporate
offset/counter arguments).

Reviewed By: wesolwsk

Differential Revision: D5276291

fbshipit-source-id: 8844731760d2c48577d207c026ce0cd641f2fc6d
2017-06-19 12:57:53 -07:00
36bfe5946d fbcode nnpack ops for Relu and LeakyRelu
Summary: As in title

Reviewed By: ajtulloch

Differential Revision: D5261447

fbshipit-source-id: 5ac4ff52a26a2d310238cd1ead90ee2736e8c5a1
2017-06-19 12:36:32 -07:00
4b4022ded7 Make test_lstm_main more stable
Summary: Title

Reviewed By: Yangqing

Differential Revision: D5268569

fbshipit-source-id: f79c38376ef2dd0684fd438668b0762341d982cf
2017-06-19 12:36:29 -07:00
2579be1227 Skip fp16 initializer test for CPU-only builds
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.
```
E           AttributeError: Method FloatToHalf is not a registered operator. Did you mean: []
```
https://travis-ci.org/caffe2/caffe2/jobs/243867951

/cc slayton58
Closes https://github.com/caffe2/caffe2/pull/829

Differential Revision: D5276796

Pulled By: akyrola

fbshipit-source-id: 34edca6090a9ce7ab39ae1fdc0e83b5c3b7e4f49
2017-06-19 12:21:25 -07:00
31769fbaf8 removed events and user group info
Summary: Closes https://github.com/caffe2/caffe2/pull/816

Differential Revision: D5276778

Pulled By: akyrola

fbshipit-source-id: 28bf0724a360e37cd3171eef9c47addf8b4e6b42
2017-06-19 12:21:24 -07:00
90a52c3904 Skip TestInferDevice if no GPU support
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.
```
E           AttributeError: Method CopyCPUToGPU is not a registered operator. Did you mean: []
```
https://travis-ci.org/caffe2/caffe2/jobs/243867951
Closes https://github.com/caffe2/caffe2/pull/818

Differential Revision: D5276735

Pulled By: akyrola

fbshipit-source-id: 35d9df19330ae522037e8a5d721d83dc2e5aa4dc
2017-06-19 12:21:24 -07:00
980e2a6b59 fixed input and output schema for all functions
Summary: Closes https://github.com/caffe2/caffe2/pull/814

Differential Revision: D5276763

Pulled By: akyrola

fbshipit-source-id: a1580d3deb2b72f52486aef379a6b8928a41301a
2017-06-19 12:21:23 -07:00
932cf9eb92 Fix entropy error coming from utility_ops_test
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.

`E           InvalidArgument: Insufficient bytes of entropy to draw requested array.  shape=(20, 12, 22), dtype=float32.  Can you reduce the size or dimensions of the array?  What about using a smaller dtype?  If slow test runs and minimisation are acceptable, you  could increase settings().buffer_size from 8192 to at least 43253760.`

https://travis-ci.org/caffe2/caffe2/jobs/243867951

/cc kittipatv
Closes https://github.com/caffe2/caffe2/pull/830

Differential Revision: D5276639

Pulled By: akyrola

fbshipit-source-id: 0c21be25ecd931837dc8b0c2cc17048f531350d1
2017-06-19 12:09:32 -07:00
172a356668 forgotten import in variables.py
Fixing error on line 661: 
warnings.warn("masked_copy_ is deprecated and renamed to masked_scatter_, and will be removed in v0.3")
NameError: name 'warnings' is not defined
2017-06-19 14:23:48 +02:00
1ec0b89361 Memonger Graph Verifier
Summary:
We want to make sure that a graph optimized by memonger doesn't have any possibility of two threads writing into the same output blob at the same time, when blobs are renamed.

Creates a graph where edges are built such that a parents node's output blob is a child node's input blob, and there is no node in between the parent and child node that writes to the same blob. If two nets generate the same such graph, then the "path" of data is the same.

Reviewed By: akyrola

Differential Revision: D5210385

fbshipit-source-id: 6317fc4e16289339b50c2dcd86ec8b32d2d544a5
2017-06-19 00:46:32 -07:00
3f860af050 Implement TopKOp for GPU
Summary:
This is a real implementation (not GPUFallbackOp) of the TopKOp for GPU.

There are two algorithm implementations:

-for k <= 512, it maps to a warp-wide min-heap implementation, which requires only a single scan of the input data.
-for k > 512, it maps to a multi-pass radix selection algorithm that I originally wrote in cutorch. I took the recent cutorch code and removed some cutorch-specific things as it made sense.

Also added several utility files that one or the other implementations use, some from the Faiss library and some from the cutorch library.

Reviewed By: jamesr66a

Differential Revision: D5248206

fbshipit-source-id: ae5fa3451473264293516c2838f1f40688781cf3
2017-06-17 08:47:38 -07:00
329a2f7d27 Prevent divide by zero in dropout with p=1 2017-06-17 11:38:02 -04:00
69e38ee821 clean test code, no functional change 2017-06-17 11:11:48 -04:00
38e6b9c7e7 fix bug in wrap_outputs miscounting the number of inputs 2017-06-17 11:11:48 -04:00
7775e9e777 add newNarrow to thpp THCTensor 2017-06-17 11:11:48 -04:00
293262b8f1 fix cuda tests 2017-06-17 11:11:48 -04:00
e66e01a2a0 remove extra computations for input usage check 2017-06-17 11:11:48 -04:00
0a93903e8e move tests to test_nn 2017-06-17 11:11:48 -04:00
bcac55dd2f force 1 stride for 1-sized dim for cudnn, fix lint, remove extra unpacking 2017-06-17 11:11:48 -04:00
6cdcd9c603 Add Narrow function
clean error message and support non perfectly sized inputs
2017-06-17 11:11:48 -04:00
075030d974 add cuda tests that use only cunn for finite difference computations 2017-06-17 11:11:48 -04:00
23dec70614 comment on working values for epsilon 2017-06-17 11:11:48 -04:00
fc0ab229ad remove extra cloning and add contiguous calls 2017-06-17 11:11:48 -04:00
ce3bc5a4a5 force cloning of weights 2017-06-17 11:11:48 -04:00
3dbece7eb5 clean tests 2017-06-17 11:11:48 -04:00
bd94718c87 cleaner AccumulateGrad 2017-06-17 11:11:48 -04:00
2f8d21a7f2 add contiguous function 2017-06-17 11:11:48 -04:00
4f4fc9091a add support for newTranspose in thpp::THCTensor 2017-06-17 11:11:48 -04:00
7ee095cf7f add newExpand and newView to thpp::Tensor 2017-06-17 11:11:48 -04:00
462ab8a644 add Transpose View Expand C functions 2017-06-17 11:11:48 -04:00
dd5c7c473f Add ConvBackwardBackward class 2017-06-17 11:11:48 -04:00
6dca309017 make AccumulateGrad support no input gradient 2017-06-17 11:11:48 -04:00
f945fbc3dd add gradgradcheck and conv double backward tests 2017-06-17 11:11:48 -04:00
db70d4d223 1) Simplify CompareOp autograd backward
2) Use better approach for avoiding divide-by-0 in autograd tests.
2017-06-17 09:38:28 -04:00
7714b5a088 Fix autograd shape tracking for 1-d reduction ops. 2017-06-17 09:38:28 -04:00
860f51e67f Avoid nans in fmod/remainder tensor tests.
Also clean up CompareOp autograd backwards impl.
2017-06-17 09:38:28 -04:00
2c04ce63a5 Fix masked_scatter autograd broadcasting. 2017-06-17 09:38:28 -04:00
83bfa5e1ab Fix masked_scatter pointwise autograd backward behavior. 2017-06-17 09:38:28 -04:00
618f20fb38 Fix autograd broadcasting for masked_fill. 2017-06-17 09:38:28 -04:00
9711223c12 Add broadcast autograd tests for dist. 2017-06-17 09:38:28 -04:00
7d0f1c51bb Fix autograd broadcast for min, max. 2017-06-17 09:38:28 -04:00
7560474fbb Fix autograd pointwise fallback for max,min. 2017-06-17 09:38:28 -04:00
e69fe5bdb0 Automatically detect when to skip inplace tests and fix lint. 2017-06-17 09:38:28 -04:00
f3ae90e329 Fix broadcast and pointwise compare ops with autograd. 2017-06-17 09:38:28 -04:00
bfdd1f2199 Fix fmod/remainder autograd broadcasting. 2017-06-17 09:38:28 -04:00
b164efb8b0 Fix lerp broadcast autograd. 2017-06-17 09:38:28 -04:00
94c7260087 Fix pointwise fallback for lerp. 2017-06-17 09:38:28 -04:00
aac459431b Fix pow autograd broadcast. 2017-06-17 09:38:28 -04:00
a04d1af0a4 Fix addr, addmm, baddmm, addmvm, addbmm broadcasting with autograd.
Fix autograd broadcast for addmm, baddmm, others.
2017-06-17 09:38:28 -04:00
a54a7c1312 Fix addcmul, addcdiv autograd broadcasting. 2017-06-17 09:38:28 -04:00
9ba799c26b Fix pointwise fallback for addcdiv, addcmul. 2017-06-17 09:38:28 -04:00
5cfb1329b5 Make implementation of Variable.mul_ and Variable.div_ consistent. 2017-06-17 09:38:28 -04:00
af2dd0d3e9 Fix autograd for broadcasting with add, sub, mul, div. 2017-06-17 09:38:28 -04:00
79a343bbd4 Remove unnecesssary squeezing in Expand backwards.
Also add size checks to test_autograd to try to catch such issues.
2017-06-17 09:38:28 -04:00
88e4bec8fa resize bug fix 2017-06-17 11:07:22 +02:00
044679ca7e Fix Pooling ND non-symmetric padding check.
Reviewed By: dutran

Differential Revision: D5270021

fbshipit-source-id: bbad8e9f07af26f7e7522844eb35bf5631883107
2017-06-16 20:33:19 -07:00
b077e28d48 make shape parameter op field for ReduceDimsOp
Summary: Since shape tensor was allocated every time, the global allocation mutex was acquired, possibly leading to slowdown.

Reviewed By: salexspb

Differential Revision: D5263899

fbshipit-source-id: b44ff0b01342f116154ec2a9c65f91b5c0e51452
2017-06-16 18:04:18 -07:00
faa7c2cc2c fix cuda breakage 2017-06-16 20:13:46 -04:00
21dc425e07 Optimize SumSqrElementsOp for CUDA
Summary: The old version used one block with 128 threads. Throughput was too low for the NMT use case (calculating squared gradient norms for every parameter), so this increases the throughput. Shaves 7% off CNN model training time per step

Reviewed By: wickedfoo

Differential Revision: D5263748

fbshipit-source-id: adc3bacd11e49ea00c60381d613d993050e899be
2017-06-16 17:03:38 -07:00
3cecdf84f1 Storage from_file method (#1821) 2017-06-17 00:34:20 +02:00
49586d9556 Add basic API support for NCCL 2.0
Summary:
\cc pietern
Minimal changes to allow gloo to compile and run with NCCL 2.0
Closes https://github.com/facebookincubator/gloo/pull/46

Differential Revision: D5268074

Pulled By: pietern

fbshipit-source-id: 58d625d57b31cfc932f3dbbdd7a4b83d9a2e60a8
2017-06-16 15:22:14 -07:00
12094b5114 Add random shuffle through the data to the benchmark workflow
Reviewed By: kdub0

Differential Revision: D5171727

fbshipit-source-id: 1d9182bb820224b479682fc0ca5014f909ba19d5
2017-06-16 13:22:46 -07:00
eefd4b0bb2 Static RNN: gpu support and lstm_benchmark integration
Summary:
While this is not intended to be the best performat and
general solution, we can see from the test plan in some cases static DAG RNN could
perform better than our own implementation. Hopefully we will get
dynamic RNN DAG execution at least as fast as this one. Then we will
not need this one in production, only for testing.

Still putting it into our benchmark for comparison purposes

Reviewed By: akyrola

Differential Revision: D5210038

fbshipit-source-id: fa44baf51c455872abd6ec5f5d151cf06e15b1fa
2017-06-16 11:31:43 -07:00
2a9cb7d4a9 use brew for Tranpose --> major perf regression fix
Summary: I accidentaly noticed that we were calling the non-CUDNN version of Transpose with attention, and it is super slow. This broke when rnn_cell was changed to use ModelHelper instead of CNNModelHelper in D5062963, but calls to transpose were not "brewed".

Reviewed By: jamesr66a

Differential Revision: D5264248

fbshipit-source-id: b61494ae210f34597245f1195d20547f5b5cd8b5
2017-06-16 11:02:48 -07:00
fda35fd19d TravisCI Overhaul
Summary:
Uncached build: https://travis-ci.org/lukeyeager/caffe2/builds/239677224
Cached build: https://travis-ci.org/lukeyeager/caffe2/builds/239686725

* Parallel builds everywhere
* All builds use CCache for quick build times (help from https://github.com/pytorch/pytorch/pull/614, https://github.com/ccache/ccache/pull/145)
* Run ctests when available (continuation of https://github.com/caffe2/caffe2/pull/550)
* Upgraded from cuDNN v5 to v6
* Fixed MKL build (by updating pkg version)
* Fixed android builds (b6f905a67b (commitcomment-22404119))

* ~~Building NNPACK fails with no discernible error message (currently disabled entirely)~~
* ~~Android builds continue to fail with existing error:~~
* ~~OSX builds time-out:~~

| Before | After | Changes |
| --- | --- | --- |
| COMPILER=g++ | linux | without CUDA |
| COMPILER=g++-5 | linux-gcc5 | without CUDA |
| COMPILER=g++ | linux-cuda | updated to cuDNN v6 |
| BLAS=MKL | linux-mkl | updated pkg version |
| BUILD_TARGET=android | linux-android | |
| COMPILER=clang++ | osx | |
| BUILD_TARGET=ios | osx-ios | |
| BUILD_TARGET=android | osx-android | |
| QUICKTEST | **GONE** | |
| COMPILER=g++-4.8 | **GONE** | |
| COMPILER=g++-4.9 | **GONE** | |
Closes https://github.com/caffe2/caffe2/pull/735

Reviewed By: Yangqing

Differential Revision: D5228966

Pulled By: bwasti

fbshipit-source-id: 6cfa6f5ff05fbd5c2078beea79564f1f3b9812fe
2017-06-16 10:18:05 -07:00
8d33603901 make t() of Variable consistent with Tensor (#1823) 2017-06-16 16:08:53 +02:00
96f19fefc0 add warning if data parallel model is created for gpus that we dont have
Summary: Don't want to assert since it can be useful to sometimes create models that are not run (for example, unit tests).

Reviewed By: pietern

Differential Revision: D5258905

fbshipit-source-id: f1beee0605bfef235ed0f23f7e78259109720254
2017-06-16 07:02:37 -07:00
1ca262b25f Disable smart_tensor_printer_test on OSX
Summary:
TravisCI failure: https://travis-ci.org/lukeyeager/caffe2/jobs/240947529

Copies solution from ef688490c4
Closes https://github.com/caffe2/caffe2/pull/768

Differential Revision: D5264295

Pulled By: akyrola

fbshipit-source-id: 891fd747a5df3edd94218dbf461a2d936d334688
2017-06-16 05:50:46 -07:00
176a841087 Fixes for CuDNNDropoutOp
Summary: Closes https://github.com/caffe2/caffe2/pull/809

Differential Revision: D5263514

Pulled By: akyrola

fbshipit-source-id: 1f1e5bdb6fa551cb1f9beb3e5d3ad9c0c8813ed0
2017-06-15 22:51:12 -07:00
6f1b1828e9 add SwitchToDevice to PrefetchOp constructor
Summary: In https://github.com/caffe2/caffe2/pull/802, slayton58 fixed issue in ImageInputOp where the std and mean blobs were allocated on wrong GPU (0). This fails when there is no P2P memory access. Fundamental reason was that ImageInputOp's constructor did not call SwitchToDevice. Operator's does, but ImageInputOp inherits PrefetchOp -> OperatorBase, neither of which does the switch. So made PrefetchOperator do the switch (OperatorBase does not have context, so it cannot).

Reviewed By: asaadaldien

Differential Revision: D5258729

fbshipit-source-id: c615c60eb2047ad26249c5bcba57ab0ef21d00e4
2017-06-15 22:35:27 -07:00
a64560c22e Remove flattening for torch.dot (#1781) 2017-06-16 02:15:33 +02:00
97f50edf46 Add documentation for Cholesky lapack functions (#1816) 2017-06-16 02:10:56 +02:00
3a91ac56cb Add a shared memory machine-wide mutex utility
Summary:
This can be used to serialize allocations and NCCL kernel calls
for example. Multiple such mutexes can be created per process.

Reviewed By: Yangqing, pietern

Differential Revision: D5073609

fbshipit-source-id: 28cc4293632f20e9623ee6531365b881d0f3d9ef
2017-06-15 15:54:31 -07:00
fc2a8d045c adding flatten indices output to TopK
Summary: This makes it easier to gather top-K by group of rows. This is useful in the situation where we want to pick up top-K from batch of fixed length sessions. Let `N` be number of sessions, and `M` be number of examples in a sessions. We would have a batch of `N * M` rows. We can reshape the score blob to `N x M`, and use it as input to `TopK` to select top score for each session. However, without the new output, it's would be inconvenient to gather the rows corresponding to the top scores. The indices are in `[0, K-1)` range. The new output can be used directly as input to `Gather`.

Reviewed By: chocjy

Differential Revision: D5171459

fbshipit-source-id: 69f7b41456c3f9670650ae07afc8fef8328485e9
2017-06-15 15:32:29 -07:00
84cc82cf3f Fix stats_ops_test
Summary:
The global StatRegistry doesn't get reset when the workspace is reset.
```
>       self.assertTrue(len(workspace.FetchBlob('k3')) == 2)
E       AssertionError: False is not true
```
https://travis-ci.org/lukeyeager/caffe2/jobs/240162665

/cc azzolini

NOTE: this error doesn't show up if you just run `stats_ops_test.py` directly. It shows up when you run other tests in the same session before this test:
```
pytest -v caffe2/python/
```
Closes https://github.com/caffe2/caffe2/pull/788

Differential Revision: D5259232

Pulled By: salexspb

fbshipit-source-id: 3c72633af6bb61c4fda62195298b1e9574b4cbef
2017-06-15 15:07:57 -07:00
dc0e857e76 README: TravisCI and Appveyor badges
Summary:
The existing per-branch TravisCI badges don't work, and will be out-dated when https://github.com/caffe2/caffe2/pull/735 is merged.

I also added an Appveyor badge.
Closes https://github.com/caffe2/caffe2/pull/786

Differential Revision: D5253408

Pulled By: aaronmarkham

fbshipit-source-id: b274b30fcef9df3d2ff7cda1046f8462ad56c83b
2017-06-15 15:07:57 -07:00
5ce9cbae70 Upgrades python/hypothesis_test.py to use brew instead of CNNHelperModel
Summary: Upgrades this file to use brew instead of CNNHelperModel

Reviewed By: harouwu

Differential Revision: D5252089

fbshipit-source-id: 6df4350717c1d42bc4bcc63d255cd422f085ee05
2017-06-15 15:07:56 -07:00
e9cba7e69f Option to read from dataset indefinitely.
Summary: Useful for benchmarking

Reviewed By: kdub0

Differential Revision: D5226758

fbshipit-source-id: 6f3e6dd256f2c40ab71e598a7ce47cd06099adff
2017-06-15 15:07:53 -07:00
d9d89b191d implement SliceOp for GPU
Summary: Implementation of the SliceOp for CUDA

Reviewed By: akyrola

Differential Revision: D5254287

fbshipit-source-id: 0a1660e1aa161fd088a2d8f886e019c05a1919a2
2017-06-15 14:34:34 -07:00
086cd6fa3e Don't continue running operators after failure
Summary:
This brings back DAGNet up to parity with SimpleNet, where
execution stops as expected after an operator fails. For the DAGNet
it's more involved, since we have to deal with all worker threads
stopping execution. Because the job queue may still hold an arbitrary
number of chains to execute, this diff explicitly closes it down,
waits for all workers to terminate, and resets the job queue, upon
seeing a failure.

Reviewed By: akyrola

Differential Revision: D5232955

fbshipit-source-id: 4dac3c3ed6e5c2ebd07473b0f8be2b02c28978e9
2017-06-15 14:34:33 -07:00
f61e4ca070 Fixes in tests to support numpy >= 0.12
Summary:
```
  File "/data/caffe2/install/caffe2/python/hypothesis_test.py", line 1911, in test_batch_to_space
    (w + 2 * pad) / block_size).astype(np.float32)
  File "mtrand.pyx", line 1404, in mtrand.RandomState.randn (numpy/random/mtrand/mtrand.c:19843)
  File "mtrand.pyx", line 1534, in mtrand.RandomState.standard_normal (numpy/random/mtrand/mtrand.c:20368)
  File "mtrand.pyx", line 167, in mtrand.cont0_array (numpy/random/mtrand/mtrand.c:6127)
TypeError: 'float' object cannot be interpreted as an index
```
```
  File "/data/caffe2/install/caffe2/python/operator_test/tile_op_test.py", line 101, in tile_ref
    tiled_data = np.tile(X, tuple(dims))
  File "/data/caffe2/venv/local/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 881, in tile
    return c.reshape(shape_out)
TypeError: only integer scalar arrays can be converted to a scalar index
```
I also tested to make sure this still works with 0.11.
Closes https://github.com/caffe2/caffe2/pull/787

Differential Revision: D5248087

Pulled By: salexspb

fbshipit-source-id: eff69482a8eabb8ace330003fa326c832b53865f
2017-06-15 14:17:20 -07:00
ed2d7d27ab LSTMUnit fix: Backed out changeset 1fa39ce474c7
Summary: some users reported eigen crashes from LSTMUnitGradient

Reviewed By: akyrola

Differential Revision: D5258396

fbshipit-source-id: 828bf5eeb899bdfa435048103ff3a96c7eb042e0
2017-06-15 14:17:19 -07:00
9d8a194cef Deprecate CNNModelHelper in python/workspace_test.py
Summary: Deprecate CNNModelHelper in python/workspace_test.py to use Model_Helper instead of CNN

Reviewed By: harouwu

Differential Revision: D5251778

fbshipit-source-id: d634f1c76e41a95b0247ebf5d5a48aef6f8e232e
2017-06-15 14:17:18 -07:00
c4c3797b0d Deprecate CNNModelHelper - Inception()
Summary:
This diff deprecates `CNNModelHelper` in `Inception()` only.

Depends on D5248848

Reviewed By: harouwu

Differential Revision: D5249312

fbshipit-source-id: 2818fb54bbae203956ed5cd5fb547508923c52a6
2017-06-15 14:03:27 -07:00
b0625ff566 Deprecate CNNModelHelper - VGGA()
Summary:
This diff deprecates `CNNModelHelper` in `VGGA()` function.

Depends on D5247946

Reviewed By: harouwu

Differential Revision: D5248848

fbshipit-source-id: ede9113edb2024e4db8f0048f812050233e3fb40
2017-06-15 14:03:26 -07:00
4aff677d3d Deprecate CNNModelHelper - OverFeat()
Summary:
This diff deprecates `CNNModelHelper` in `OverFeat()` function.

Depends on D5247004

Reviewed By: harouwu

Differential Revision: D5247946

fbshipit-source-id: 6a5299ec71f78e0b81a43212a028651522ab8f4b
2017-06-15 14:03:25 -07:00
078589d7c6 Deprecate CNNModelHelper - AlexNet()
Summary:
This diff deprecates `CNNModelHelper` in the `AlexNet()` function. More diffs will be coming to deprecate the helper in other functions.

Depends on D5241738

Reviewed By: harouwu

Differential Revision: D5247004

fbshipit-source-id: eec5c5ef916a85de8289cb92d2174a6a4b8075bf
2017-06-15 14:03:24 -07:00
c095b3f67f Deprecate CNNModelHelper - MLP()
Summary: This diff deprecates `CNNModelHelper` in `MLP()` function.

Reviewed By: harouwu

Differential Revision: D5241738

fbshipit-source-id: 03669a4166a02257aa5779860d06b40d7496104d
2017-06-15 14:03:23 -07:00
78a6d2f8ba Fix potential GPU transform OOB access
Summary:
Occurred when running with multiple GPUs, not all of which
are connected via P2P.

Essentially when mean_gpu_ and std_gpu_ are allocated and
populated in the constructor of ImageInputOp, it does not seem to
be guaranteed that the active context is the same as the final context
on which the Op will be run. This causes the image data and the
mean/std to be on different devices. With P2P we don't mind this, but
without this causes OOB memory accesses in the GPU transform
kernel.
Closes https://github.com/caffe2/caffe2/pull/802

Differential Revision: D5258528

Pulled By: akyrola

fbshipit-source-id: 778e55b5f8bb39fc52644b68573c747210ebf3bb
2017-06-15 13:22:58 -07:00
8ef12951e0 Fix for protobuf with unicode_literals
Summary:
Python 2.7, Protobuf 2.6

    >                   op.ClearField('uuid')
    E                   TypeError: field name must be a string

Fix: http://python-future.org/imports.html#should-i-import-unicode-literals

/cc salexspb tomdz
Closes https://github.com/caffe2/caffe2/pull/804

Differential Revision: D5258494

Pulled By: akyrola

fbshipit-source-id: 04c473c1e55bf8caac0bfde7d86171c9f95e71a1
2017-06-15 13:22:57 -07:00
7ffd76db51 check operator schema before calling gradient creator
Summary: Hard-to-debug problems arise when a gradient creator fails when the forward op is incorrect itself. Add checking of the schema before callig the creator. Also clarify the error messages

Reviewed By: Yangqing

Differential Revision: D5256016

fbshipit-source-id: 78550f7e2ce5b88e26b69fdae4be0eece52edfea
2017-06-15 13:04:58 -07:00
024afc7b0d Simplify the implementation of AccuracyOp and Enable top-k in GPU
Reviewed By: wickedfoo

Differential Revision: D5224685

fbshipit-source-id: 37c23e580903775f42347f55b4747c74e2863a35
2017-06-15 10:31:58 -07:00
ca34de8b4e revert adding extra semicolon
Summary: revert changes

Reviewed By: jamesr66a

Differential Revision: D5251322

fbshipit-source-id: ff2b47890388291aaf8a0b221b69ab053b556b6a
2017-06-15 10:31:57 -07:00
6500d7f307 Fixing a small bug in schema where the number of default arguments doesn't match the number of fields
Summary:
The current version of schema.py has a Metadata class with three fields. The default for it is set to
four Nones. This is just changing that to three Nones so that the number of default values matches the number
of actual fields.

Reviewed By: kennyhorror

Differential Revision: D5250463

fbshipit-source-id: 42e5650d270f5f63662614d8445b4819ed370dec
2017-06-15 10:31:56 -07:00
be7c336626 Deprecate CNNModelHelper in python/memonger_test.py
Summary: Also fixed a small bug in ModelHelper constructor

Reviewed By: harouwu

Differential Revision: D5246799

fbshipit-source-id: 3719ca078f0e2b5e463fc93da9c8215f5583bd9a
2017-06-15 10:06:57 -07:00
86a96cd759 Merge commit 'd605afe8b51bf1522d3caf4efef4b3c85def499b' 2017-06-15 12:33:45 -04:00
f61ec2495e nn.EmbeddingBag to compute a bag of word embeddings (Embedding + Sum/Mean) 2017-06-15 12:32:47 -04:00
d605afe8b5 nn.EmbeddingBag to compute a bag of word embeddings (Embedding + Sum/Mean) 2017-06-15 12:32:28 -04:00
909f31764f Add nn.padding to docs fixes #1127 (#1808)
* exposed nn.padding modules

* using functional
2017-06-15 07:41:38 -04:00
7bf4c0e0fb support RNNs in ExtractPredictorNet
Summary:
We need to support RNNs explicitly in ExtractPredictorNet, because they store sub-nets as strings in special arguments. When netdef argument arrive, we can generalize this a bit.

Added a test under rnn_cell_test to test that extracting an LSTM predictor net works correctly and sets the device option properly for the step net ops.

Reviewed By: yqwangustc

Differential Revision: D5236334

fbshipit-source-id: cd653427f8c440a14d94195a532d18276f94749a
2017-06-14 22:32:29 -07:00
2ec294a8bb Fix a few typos and grammars in comment
Summary:
Fix a few typos and grammars in comment

by using language-check, python library
spell_checker source code is here : https://github.com/17-1-SKKU-OSS/011A/blob/master/spell_checker/spell_checker.py
here is the text file which indicates what things should be fixed :  https://github.com/17-1-SKKU-OSS/011A/tree/master/spell_checker/fix/caffe2
Closes https://github.com/caffe2/caffe2/pull/719

Differential Revision: D5165118

Pulled By: aaronmarkham

fbshipit-source-id: 7fb8ef7a99d03cd5fd2f9ebdb01b9865e90fc37b
2017-06-14 18:22:39 -07:00
ea5819045e a few comments in build_all.sh (#1807) 2017-06-14 17:58:56 -04:00
46a95cf420 Allow specifying device to load_from_db()
Summary: A quite common problem is that it is hard to load blobs with pe.load_from_db to a specific device. One must set the device options of the returned init_net and predict_init_net, which is quite magical. So I made load_from_db() able to set these device options automatically, based on device scope or device_option parameter. Added an unit test.

Reviewed By: asaadaldien

Differential Revision: D5249202

fbshipit-source-id: 7b9d91476cb8d1b0ec0d9772e50b9148b8b184fa
2017-06-14 14:32:24 -07:00
86594075f7 Third fix for KeepOnShrink tests
Summary:
See https://github.com/caffe2/caffe2/pull/723, https://github.com/caffe2/caffe2/pull/551, https://github.com/caffe2/caffe2/issues/417
```
[ RUN      ] TensorCPUTest/1.KeepOnShrink
/opt/caffe2/caffe2/core/blob_test.cc:362: Failure
Expected: (ptr) != (larger_ptr), actual: 0x24f7640 vs 0x24f7640
[  FAILED  ] TensorCPUTest/1.KeepOnShrink, where TypeParam = int (0 ms)
```
I haven't been able to reproduce locally or on TravisCI - only in our own test infrastructure. The fix is conceptually the same as https://github.com/caffe2/caffe2/pull/723. After reading through the code, I can't see any other checks which should fail this way.
Closes https://github.com/caffe2/caffe2/pull/801

Differential Revision: D5248106

Pulled By: akyrola

fbshipit-source-id: 0cd3e2c11e7ae3d924496843a311530f0e08a9da
2017-06-14 11:48:15 -07:00
eaacfc7e25 Fix multi-precision SGD outputs
Summary:
salexspb This fixes a major perf issue (40% boost on alexnet end-to-end perf) in the multi-precision SGD optimizer - it was causing repeated cudaMalloc / cudaFree calls during training iterations due to the changing size of the `grad` blob as it moved from fp16 <-> fp32.
Closes https://github.com/caffe2/caffe2/pull/797

Differential Revision: D5246978

Pulled By: salexspb

fbshipit-source-id: ec3d7ef18445e19eaf5aac908d0a7bcd5957eb60
2017-06-14 11:36:43 -07:00
29a1a916dc Add support for CUDA9 half semantics 2017-06-14 11:20:24 -07:00
94d42b03fb MaxReduction ops GPU implementation.
Summary:
Move rowwise-max kernel from Softmax to math_util library and implement
colwwise-max kernel and MaxReduction ops.

Reviewed By: akyrola

Differential Revision: D5240329

fbshipit-source-id: a07281a877324de459aace33ff21175a68cfd8f6
2017-06-14 11:02:46 -07:00
9c53c6dcb9 Fix errors and warnings when building docs (#1806) 2017-06-14 13:50:14 -04:00
9d916e561c batch norm docfix (#1804)
fixes the formula for batch normalization (moves the epsilon inside
the square root)
2017-06-14 11:57:46 -04:00
c1f974aa9f Deprecate CNNModelHelper in python/crf.py
Reviewed By: harouwu

Differential Revision: D5241631

fbshipit-source-id: 3dc448355bc2a766ae9eda1dc579e501743b35cf
2017-06-14 08:49:27 -07:00
4e356528b4 Add torch.matmul function. (#1780)
* Add torch.matmul function.

Includes test_torch, test_autograd and docs changes.

* Add __all__ to functional so imports are accidentally imported.

* Include unbind in __all__.

* Add matmul case for when one argument is 1-dimensional and the other
at least 3-dimensional.

* Add squeeze_ to Variable.

* Use squeeze_ instead of squeeze for matmul.
2017-06-14 08:14:53 -04:00
9fd354e643 More accurate build instructions based on @apaszke's comments. (#1800) 2017-06-14 12:04:45 +02:00
d03ffb211c Remove WORKER_INIT_CALLS
Summary: This was only needed in order to initialize stateful PythonOps. Now PythonOp has support for initialization at Op creation time, so this is not used anymore.

Reviewed By: dzhulgakov

Differential Revision: D5242908

fbshipit-source-id: dbaa249466dd0f37f25d204d387b1f99c6dd4fed
2017-06-13 20:18:48 -07:00
eebda50b79 Operator python traceback
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.

Reviewed By: jamesr66a

Differential Revision: D5226047

fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
2017-06-13 18:50:02 -07:00
c8e9bc493b Merge commit '244af06adc77674e7e1134d67d4a56ae7641f7b9' 2017-06-13 20:49:37 -04:00
6de5ce6bac Merge commit '1cf105d517c4308912eee85eff8f50f31c9e31f1' 2017-06-13 20:49:13 -04:00
38b9598685 Added GLU (gated linear unit)
From https://arxiv.org/abs/1612.08083
2017-06-13 20:48:19 -04:00
244af06adc Added GLU (gated linear unit)
From https://arxiv.org/abs/1612.08083
2017-06-13 20:48:03 -04:00
1cf105d517 Added GLU (gated linear unit)
From https://arxiv.org/abs/1612.08083
2017-06-13 20:47:55 -04:00
3ada9da808 Make csrc -Werror clean. (#1795)
Primary things I had to fix:

- Suppress _XOPEN_SOURCE warnings by ensuring that Python.h is included
  first, because it always unconditionally defines this macro.

- Turn off strict aliasing, because Python 2 doesn't work with strict
  aliasing.

- Workaround setuptools bug, where it's incorrectly passing
  -Wstrict-prototypes to C++ compilers (where this doesn't make
  any sense)

To compile csrc with -Werror, run `CFLAGS="-Werror" python setup.py build_ext`

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 20:18:09 -04:00
a2521148b4 TimeObserver for SimpleNet, an example usage of Observers.
Summary: Implemented TimeObserver for SimpleNet.

Reviewed By: pietern

Differential Revision: D5188373

fbshipit-source-id: 530d75d176aa29d38c131338c3a2be70bc221a47
2017-06-13 17:02:11 -07:00
d3ec6e8f55 Run python op builder at op creation time
Summary:
This allows to construct a python op by passing a pickled "builder function call" as an argument to the op.
The builder function is called at PythonOp construction time and returns a function that will be called when the op is run.

This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely.

Reviewed By: dzhulgakov

Differential Revision: D5080833

fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce
2017-06-13 16:29:22 -07:00
5a63a6d47f Better document how to rebuild only parts of the project. (#1796)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 17:23:39 -04:00
38a48729f0 Merge commit '1a6995b28ca42df41270d4fd914adfb9c8c59674' 2017-06-13 16:31:48 -04:00
deb0aef30c Merge commit '122dd9e8ec4627ccdd895a7dc88a1ec6f13ad6d2' 2017-06-13 16:31:13 -04:00
3977ee3520 Support device on sparse tensor constructor, assert values/indices on same device.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:35 -04:00
c0e7bda3f1 Enforce storage is not NULL invariant for sparse tensors.
Fixes #1783.

There is an undocumented invariant in PyTorch that we should
try to avoid having storage == NULL as much as possible (even
though Torch supports it.)  This commit properly documents the
invariant, and fixes a bug in sparse where the invariant was
not respected.  This now means that sparse tensors now correctly
remember what GPU they are associated with.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:35 -04:00
df412051fd Add comment stating nDenseTensors != nTensors in checkGPU.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:35 -04:00
7bee03fe1e Do NOT clone indices/values passed to sparse tensor by default.
Fixes #1782.

The default operation should be cheap: user can always choose to
explicitly make a copy on the way in.  Note that this is a
BACKWARDS COMPATIBILITY BREAKING change.  However, we DO create
a new tensor wrapper (so we are not affected by subsequent
size changes, etc.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:34 -04:00
865beada0e Add comment about new implementation being CPU-only.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:34 -04:00
6a46863c83 Abort on known bug (#1521) for spcadd on non-coalesced.
It's better to error than to silently give wrong results.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
d763db59a9 More efficient nnz test in spcadd.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
5d6e593c67 Test clone preserves uncoalescedness if it wasn't coalesced.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
bac408b693 Add some docs about storage->Size.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
2f967a204c Sparse tensor clone() preserves coalescedness.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
1a6995b28c Short-circuit copy if src and dest are equal.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:20:04 -04:00
122dd9e8ec Short-circuit copy if src and dest are equal.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:19:35 -04:00
b877d4b5f8 Misc fixes for Python 3
Summary: As title

Differential Revision: D5216942

fbshipit-source-id: def5563f1b259efefab3a829d8a78d8d3297ffc7
2017-06-13 12:18:43 -07:00
795e7e64e8 add truncation for sparse feature
Summary:
truncate id list using the max length computed in compute meta, so that it has determined length,
which is useful for position weighted pooling method.

Reviewed By: sunwael

Differential Revision: D5233739

fbshipit-source-id: f73deec1bb50144ba14c4f8cfa545e1ced5071ce
2017-06-13 10:46:53 -07:00
7c024e93c6 Implement Cumprod function for autograd (#1439) 2017-06-13 17:48:15 +02:00
b4698d6d1d add init to __init__.py of torch.nn (#1789) 2017-06-13 09:02:30 -04:00
d9d50f80c7 Rename arguments to distributed collectives 2017-06-12 22:02:11 -04:00
714351ff39 Officially enable process-group mode 2017-06-12 22:02:11 -04:00
6f51b4ce2d Fix deadlock in GlooCache 2017-06-12 22:00:22 -04:00
12813b88f6 Add DistributedDataParallel 2017-06-12 22:00:22 -04:00
23ab9d481a Add Module._all_buffers 2017-06-12 21:58:38 -04:00
8db8716c7c Support non-default streams in NCCL reduce 2017-06-12 21:58:38 -04:00
b37f18be53 Free GIL when entering THD functions 2017-06-12 21:58:38 -04:00
5a0d5ec058 Add more checks in torch.distributed 2017-06-12 21:58:38 -04:00
095ddc7d08 THD updates and bug fixes
* Add keepdim
* Fix DataChannel signature
* Fix incorrect locking
* Use current stream in DataChannelGloo
2017-06-12 21:58:38 -04:00
86a065e45b Add end callbacks to the engine 2017-06-12 21:58:38 -04:00
59d438de2e change function to remove dependence on CUDA 8.0
Summary: Replace call to function that is only supported in CUDA 8.0 with one that has been supported in previous releases.

Reviewed By: pietern

Differential Revision: D5231755

fbshipit-source-id: d72aec2a4a1c511064a65142887f8a05b51dad55
2017-06-12 15:53:59 -07:00
6626881e7a Add Alpha Dropout (#1775) 2017-06-13 00:39:49 +02:00
406d748423 better engineering for core_test.TestInferDevice
Summary: Recently people find that this test is too strict because of proto string matching. Thus, I change it to compare fields so that this test will not complain even if protobuf chnaged in future.

Reviewed By: dzhulgakov

Differential Revision: D5229855

fbshipit-source-id: 54efcd7a0f9e5dbba1ddeb480801abcb859e07bd
2017-06-12 15:23:00 -07:00
0f787a01bc map operator (move maptrait def out of class)
Summary: added an operator that converts key/value blobs into a blob containing a map pointer, unittest passed.

Differential Revision: D5224449

fbshipit-source-id: 2f60754ed3ba6ed16039c09019117ae3c3646ab2
2017-06-12 14:52:04 -07:00
c7f5bf282b Revert py::bytes -> std::string
Summary: As title

Reviewed By: salexspb

Differential Revision: D5229338

fbshipit-source-id: 3bc9442c76061436db8f3217c1ba8edfd9581f8b
2017-06-12 14:11:37 -07:00
c1420330b2 Fixes the checkpoint test.
Summary:
Diff D5224410 initializes the should_stop_blob explicitly. With that, we will
have one more blob when executing the job. Adjusts the check accordingly.

Reviewed By: azzolini

Differential Revision: D5228398

fbshipit-source-id: 439b186c30b0b1d0e41e513babbcccd85e7a1b4a
2017-06-12 12:19:14 -07:00
7f1385e70c Improve gradient accumulation of the framework: 1.5x - 2x
Summary:
We waste extra memory by creating two autosplit gradient
blobs and then accumulating it into them main one. Sometimesk, when Sum
/ Sub ops are involved, we can avoid wasting extra memory at all.

Ideally we would not waste any memory and make ops add to the same
blob rather then calculating separate results and then mering
them. But it would require a substantial change to the frameworks and
rewriting a lot of operators.

Reviewed By: dzhulgakov

Differential Revision: D5157667

fbshipit-source-id: 8293824d6cdd971d8853ae90aee68e4a6d1e132b
2017-06-11 22:02:30 -07:00
817ae5b5eb Revert D5211826: [caffe2][PR] Avoid compiler warning about unreachable code
Summary: This reverts commit 9bb134b387d6620f1235a7b1ddf13092d73ae44c

Differential Revision: D5211826

fbshipit-source-id: 0468a95cb46d6d04d8d97a4f6c3bd06eab8d9bb4
2017-06-11 21:56:31 -07:00
638fe804dc Implement recover_input_schema_by_prefix
Summary:
It's very useful for simple cases like benchmarking nets where we want to encode input/output record in the net and don't want to go through the hurdles of storing input/output record in MetaNetDef.

For those cases I propose remapping the input/output record before saving to 'input_record/{field_name}'. Then we can recover input/output record back just based on the names of the blobs.

Differential Revision: D5170473

fbshipit-source-id: ac5daa60051605ed93022aec1377a49f08f15663
2017-06-11 15:37:12 -07:00
b133c214ce fix potential bug in task.py
Summary: as titled

Differential Revision: D5225166

fbshipit-source-id: 9247fe44922c097752c6996ee9192ec72b7e7d88
2017-06-11 10:40:47 -07:00
827a0ac2fe Fix comment mistakes in task.py
Summary: as titled

Reviewed By: kennyhorror

Differential Revision: D5225154

fbshipit-source-id: 99a9547e15e0d5a4c81b6339ce75406160a7fc07
2017-06-11 10:17:07 -07:00
49ec984c40 Ensure warnings are repeated in python2 for tests. 2017-06-11 05:37:59 -04:00
afaad94fed Rename autograd keepdim tests that now default to True. 2017-06-11 05:37:59 -04:00
4f602a52b5 Use THPUtils_assert rather than THError in torch/csrc/Module. 2017-06-11 05:37:59 -04:00
3abc8be42c Clarify use of warn vs raise in expand_utils and don't catch exception in Broadcast plugin when fallback = false. 2017-06-11 05:37:59 -04:00
f4ce99fd87 Add dist, atan2, lerp to fallback functions.
They weren't documented as having those semantics, but tests on
master show they do.
2017-06-11 05:37:59 -04:00
d5a0f97ea7 Renamed masked_copy to masked_scatter in test, fix use of break/continue. 2017-06-11 05:37:59 -04:00
e8ec4110f6 Fix Prod backward for broadcasting. 2017-06-11 05:37:59 -04:00
ffd808768e Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function. 2017-06-11 05:37:59 -04:00
5b81746767 Simplify python warning settings and cleanup tests. 2017-06-11 05:37:59 -04:00
d49b73bbe6 Rename check_fallback to check_backincompat_expand_warn for clarity. 2017-06-11 05:37:59 -04:00
7040b82ede Change async/broadcast copy arguments to be parsed as ints. 2017-06-11 05:37:59 -04:00
723819014e Move expand_utils-inl.h to generic/ and generate via macros. 2017-06-11 05:37:59 -04:00
1ef4cc1591 Incorporate review comments:
1) Line up trailing dimensions in broadcast docs.
2) remove unnecessary expand_as in common_nn test.
3) use view in tensor_str instead of resize_.
4) newExpand remove raiseErrors change.
5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry.
6) simplify inferSize2/inferSizeN implementations.
7) use new-style classes for warning.
2017-06-11 05:37:59 -04:00
deec86cc05 Clarify a number of comments. 2017-06-11 05:37:59 -04:00
7da46097fe Fix lint errors. 2017-06-11 05:37:59 -04:00
21d9b0c9dd Ensure warnings are repeated in test, necessary in python2. 2017-06-11 05:37:59 -04:00
69287250d1 Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests. 2017-06-11 05:37:59 -04:00
74a23c5aba Fix test_broadcast for cuda tensors, since map_, map2_ not implemented. 2017-06-11 05:37:59 -04:00
177785eecf explicit Ptr constructors, fast transposed copy. 2017-06-11 05:37:59 -04:00
ad9604f45a Add documentation for copy_. 2017-06-11 05:37:59 -04:00
65b23f146e Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils. 2017-06-11 05:37:59 -04:00
c54e532954 Add broadcasting support for map_, map2_. 2017-06-11 05:37:59 -04:00
ec120fac0c Add broadcasting support for masked_copy, masked_fill. 2017-06-11 05:37:59 -04:00
e06523482a Use THSize_isSameSizeAs, instead of THTensor_(isSameSizeAs) in order to compare sizes of tensors with different data types. 2017-06-11 05:37:59 -04:00
d6fb92fec9 Improve in-place broadcasting back compat warning message and fix an issue where the deprecated warning would not be printed. 2017-06-11 05:37:59 -04:00
5e1a714386 Add backwards incompatibility docs. 2017-06-11 05:37:59 -04:00
be65f46c76 Add optional warning for backwards incompatible keepdim. Setting torch.utils.backcompat.keepdim.warning.enabled=True will cause Python warnings in the case where the default value of keepdim is used for 1-d reductions.
Also specify keepdim via kwargs in library so these warnings have less
noise.
2017-06-11 05:37:59 -04:00
3556d1b8a3 Add optional warning for backwards incompatible broadcast.
Setting torch.utils.backcompat.broadcast.warning.enabled=True
will cause Python warnings in the case where broadcast occurs
but previously 1-d view style pointwise ops occured.
2017-06-11 05:37:59 -04:00
5af46cb352 Add broadcasting support for matmul. 2017-06-11 05:37:59 -04:00
a36f95fe26 Add broadcast support for fused-matmul broadcasting. Functions are: addmm, addbmm, addr, addmv, baddbmm. 2017-06-11 05:37:59 -04:00
cd35091d9b Include simple broadcasting example and demonstrate lining up trailing dimensions. 2017-06-11 05:37:59 -04:00
3c586d196a Document Broadcast Plugin. 2017-06-11 05:37:59 -04:00
8e2f347951 Proof that broadcasting 3 args (expand3) is equivalent to breaking up operation. 2017-06-11 05:37:59 -04:00
d279c6e099 Docs for addcdiv, addcmul 2017-06-11 05:37:59 -04:00
014372e707 Support "fused" ops: addcmul/addcdiv. 2017-06-11 05:37:59 -04:00
92fde6cf06 Breakup in place broadcast to better handle multiple arguments. 2017-06-11 05:37:59 -04:00
b44ea57ba8 Change order of Broadcast specification.
Since fused ops require broadcasting self over multiple other arguments,
it is simpler to specify broadcast on self rather than the other
way around.
2017-06-11 05:37:59 -04:00
e96f854ce2 Implement/test broadcasting semantics for comparison ops. 2017-06-11 05:37:59 -04:00
edf2969bd8 Backwards compatible Spatial Normalizations / CrossMapLRN. 2017-06-11 05:37:59 -04:00
e653fe2857 Test fixes for keepdim=False, suppress warnings on backwards-compatible behavior. 2017-06-11 05:37:59 -04:00
70c33777a6 pow, fmod, remainder also should fallback.
This behavior isn't listed in the docs, but the tests depend on it.
2017-06-11 05:37:59 -04:00
471dfe9791 Add documentation including links to numpy broadcasting semantics. 2017-06-11 05:37:59 -04:00
85d838a028 Testing over the following: 1) CPU tensor out-of-place functions 2) CPU tensor in-place functions 3) GPU tensor out-of-place functions 4) GPU tensor in-place functions 5) torch. functions 6) Fallback semantics (use pointwise nElem matching rather than broadcasting) 2017-06-11 05:37:59 -04:00
6a40acb4f0 Add Broadcast plugin. 2017-06-11 05:37:59 -04:00
9087624634 Revert "Restore examples with keepdim=True default."
This reverts commit 6fab62173e842bbf550de1c68cfae507ca35b800.
2017-06-11 05:37:58 -04:00
e772a440cb Revert "Change keepdim default to False."
This reverts commit e124790cb2b6675a4b6edf64620a7eb7f7228b29.

Note the original commit message is incorrect; this changes keepdim
back to false.
2017-06-11 05:37:58 -04:00
efd8b54be2 Merge commit 'e45c1046feba46aef2ffac1b1d978a3e76936bab' 2017-06-11 05:37:51 -04:00
54c3441e9c Merge commit '7d1b042cb2198d2bdb5871b08c6c0fb2ccc8e6b1' 2017-06-11 05:37:18 -04:00
4102a79da4 Explicitly set should_stop_blob to False in pipeline init
Summary: This diff fixes an issue with running the same reader in the same workspace multiple times. In order to achieve correct behavior of execution step we have to explicitly initialize should_stop_blob with False.

Reviewed By: kennyhorror

Differential Revision: D5224410

fbshipit-source-id: 4ad2740e187b62b0a1f5612ea3eef223dcc8a799
2017-06-11 02:33:42 -07:00
7d1b042cb2 fix type 2017-06-11 04:42:34 -04:00
e45c1046fe Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function. 2017-06-11 04:33:54 -04:00
a563ce1105 Incorporate review comments:
1) Line up trailing dimensions in broadcast docs.
2) remove unnecessary expand_as in common_nn test.
3) use view in tensor_str instead of resize_.
4) newExpand remove raiseErrors change.
5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry.
6) simplify inferSize2/inferSizeN implementations.
7) use new-style classes for warning.
2017-06-11 04:33:54 -04:00
92d52bf395 Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils. 2017-06-11 04:33:54 -04:00
0463ddf16b Support "fused" ops: addcmul/addcdiv. 2017-06-11 04:33:54 -04:00
9060e6be7f Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function. 2017-06-11 04:32:08 -04:00
f0b8c4821b Incorporate review comments:
1) Line up trailing dimensions in broadcast docs.
2) remove unnecessary expand_as in common_nn test.
3) use view in tensor_str instead of resize_.
4) newExpand remove raiseErrors change.
5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry.
6) simplify inferSize2/inferSizeN implementations.
7) use new-style classes for warning.
2017-06-11 04:32:08 -04:00
0f79bf1a69 Clarify a number of comments. 2017-06-11 04:32:08 -04:00
503002eda7 Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils. 2017-06-11 04:32:08 -04:00
cf55e1e48a Add broadcasting support for masked_copy, masked_fill. 2017-06-11 04:32:08 -04:00
8d35d4215b Use THSize_isSameSizeAs, instead of THTensor_(isSameSizeAs) in order to compare sizes of tensors with different data types. 2017-06-11 04:32:08 -04:00
9356640453 Properly clean up expand error cases. 2017-06-11 04:32:08 -04:00
ae6b8d0112 Include simple broadcasting example and demonstrate lining up trailing dimensions. 2017-06-11 04:32:08 -04:00
ec2f6a81fd Support "fused" ops: addcmul/addcdiv. 2017-06-11 04:32:08 -04:00
1f9a365fdc Add Infer Size N, for expansion of fused operations. 2017-06-11 04:32:08 -04:00
d38a87217f Expand improvements
1) Rename calculateExpandGeometry to inferExpandGeometry for consistency
2) Simplify inferExpandGeometry implementation by using a single pass
   through dimensions
3) Implement a two operand expansion, expand2.
4) Implement versions that return error code to use for fallback to
equal nElem support.
2017-06-11 04:20:04 -04:00
baa4ba973b Expand improvements
1) Rename calculateExpandGeometry to inferExpandGeometry for consistency
2) Simplify inferExpandGeometry implementation by using a single pass
   through dimensions
3) Implement a two operand expansion, expand2.
4) Implement versions that return error code to use for fallback to
equal nElem support.
2017-06-11 04:19:37 -04:00
a24db91a38 Add SELU activation function (#1769)
* Add SELU activation function

* Remove unnecessary case

* Add Function for SELU + tests and fix RReLU inplace

* Fix extra line in doc

* Fix tests

Remove in-place tests for RReLU. For some reason they fail on legacy nn, but passes on nn

* SELU in new-style Function

It also supports double backprop, verifyed with gradgradcheck

* Fix flake8
2017-06-11 10:07:48 +03:00
e3d5826b92 Add Cumsum double backwards support. (#1758) 2017-06-10 18:27:44 +02:00
ba690d5607 Add support for NVTX functions. (#1748) 2017-06-10 18:26:58 +02:00
5f1a16a018 Torch manual seed to seed cuda devices (#1762) 2017-06-10 12:37:21 +02:00
eced95ffe5 caffe2 video_io.cc bug fix
Summary: fix video_io.cc bug

Reviewed By: dutran

Differential Revision: D5224323

fbshipit-source-id: fa31f87d1053f38546a988f15368d39124fb40f7
2017-06-09 23:17:20 -07:00
dcf07a2d7f Fix typo in ParameterList documentation 2017-06-10 02:16:52 +02:00
e01769ece5 map operator
Summary: added an operator that converts key/value blobs into a blob containing a map pointer, unittest passed.

Differential Revision: D5166513

fbshipit-source-id: 748527c423a163fe55f914c08fff3adfc74a540c
2017-06-09 15:17:29 -07:00
c7aa8e142d Add gradient to SparseToDense op
Summary: As desc.

Differential Revision: D5169423

fbshipit-source-id: 64c72933c14c3caabfbe0bf85912194a479c24fa
2017-06-09 13:47:21 -07:00
c822e89956 Rename SparseToDense layer
Summary:
The SparseToDense layer is essentially calling the SparseToDenseMask op.
This makes it impossible to call the functional layer with the true SparseToDense op.
This diff is to rename the layer.

Please let me know if I missed anything or you have a better name suggestion.

Differential Revision: D5169353

fbshipit-source-id: 724d3c6dba81448a6db054f044176ffc7f708bdb
2017-06-09 12:48:27 -07:00
7517f050fc apply clang-tidy modernize-use-override
Summary: Use clang-tidy to mechanically add missing `override` and remove redundant `virtual`.

Reviewed By: igorsugak

Differential Revision: D5211868

fbshipit-source-id: 6a85f7c4a543a4c9345ec5b0681a8853707343dc
2017-06-09 11:33:07 -07:00
20382004d6 apply clang-tidy modernize-use-override
Summary: Use clang-tidy to mechanically add missing `override` and remove redundant `virtual`.

Reviewed By: igorsugak

Differential Revision: D5211868

fbshipit-source-id: 4118c4c72f8ec3485507f69679f7e852b3eaeb73
2017-06-09 11:33:07 -07:00
2f385d490b apply clang-tidy modernize-use-override
Summary: Use clang-tidy to mechanically add missing `override` and remove redundant `virtual`.

Reviewed By: igorsugak

Differential Revision: D5211868

fbshipit-source-id: 15cec17d39690ffa8072ffeccdf9fedaae1f6839
2017-06-09 11:33:06 -07:00
072f4dbefc net_printer_quick_fix
Summary: To deal with encode failure

Reviewed By: azzolini

Differential Revision: D5215897

fbshipit-source-id: cf8687706f7e4deaee05b61cd2bfeaff88672fcc
2017-06-08 19:34:50 -07:00
c291c97494 Add integration test for pos_w
Summary: Title

Reviewed By: azzolini

Differential Revision: D5197307

fbshipit-source-id: 425bf8e7c5068ea544e5b2709b6bb27eef140bf3
2017-06-08 18:04:53 -07:00
df72826ead Static RNN
Summary:
Static RNN allows to unroll an RNN into Caffe2 graph using all existing cell abstractions. In this diff I introduce several new tests that already caught a few bugs in our RecurrentNetworkOp gradient accumulation logic by comparing it to an unrolled version.

Another use case is perf - potentially we can run an unrolled net faster because DAGNet will have access to the whole graph. Same about memonger. But this work is not part of this diff

Reviewed By: akyrola

Differential Revision: D5200943

fbshipit-source-id: 20f16fc1b2ca500d06ccc60c4cec6e81839149dc
2017-06-08 17:48:48 -07:00
bb9077a6cd Network forward / backward equality checker
Summary:
In some cases you have an optimized network and a normal
one. And you would like to make sure they produce same results. If
math under the hood is the same, you could do this with a very high
precision compare to a traditional numerical gradient check. One of
the application - RNNs. There we can unroll RNN into Caffe2 graph and
make sure result is the same as in the optimized version using
RecurrentNetworkOp.

Another possible application - graph transformations. We can verify
that after that nets produce same gradients (cc akyrola on memonger,
bwasti on other transformation ideas)

Reviewed By: bwasti

Differential Revision: D5200855

fbshipit-source-id: 0196af187f0c2feb33de4778ea08d0d288fe1017
2017-06-08 17:48:47 -07:00
264f75fdd0 ZeroGradient op
Summary:
when building a multi layer static RNN the last timestep of
the first layer (and other layers except the last one) doesn't get a
gradient for the cell state as normally user uses results only from
the last layer and cell state doesn't go up either.

ZeroGradient provides a general solution for injecting 0 gradient
blobs. It is in some way similar to StopGradient operator which is
also specialcased

Reviewed By: bwasti

Differential Revision: D5198375

fbshipit-source-id: a21d0cfb3676a77fac72e5897a200d0bd25fc6de
2017-06-08 16:02:38 -07:00
3f4e9ab99c Add support for group arg to fbcode nnpack conv op
Summary: Support grouped convolutions using the `group` arg in the nnpack convolution implementation.

Reviewed By: Maratyszcza

Differential Revision: D5204743

fbshipit-source-id: 81116213f7a4f6afa793e4bdf1c5bdd9a55e124f
2017-06-08 14:05:56 -07:00
52ee7697f4 Fixing broken Python tests
Summary:
`brew_test.py` is just plain broken. `core_test.py` doesn't work with pytest. `apmeter_test.py` and `top_k_test.py` don't work for CUDA builds.
Closes https://github.com/caffe2/caffe2/pull/765

Differential Revision: D5211817

Pulled By: Yangqing

fbshipit-source-id: 78ec5af35a3fa870978e4c9590210ade9e3bc5ac
2017-06-08 13:34:46 -07:00
10ec905289 Avoid compiler warning about unreachable code
Summary:
Fix https://github.com/caffe2/caffe2/issues/764

I don't think we care much about the behavior, exactly, as long as it's a loud clear crash - right?
Closes https://github.com/caffe2/caffe2/pull/766

Differential Revision: D5211826

Pulled By: Yangqing

fbshipit-source-id: 9bb134b387d6620f1235a7b1ddf13092d73ae44c
2017-06-08 13:34:45 -07:00
5c0b22ea03 Fix observer_test
Summary:
Fixes bug in tests introduced with 52dafaa7db.
```
[ RUN      ] ObserverTest.TestNotifyAfterDetach
/data/caffe2/caffe2/core/observer_test.cc:130: Failure
      Expected: 0
To be equal to: counter.load()
      Which is: 1212
```
Bug disovered with TravisCI builds at https://github.com/caffe2/caffe2/pull/735, but can also be seen in the existing builds at https://travis-ci.org/caffe2/caffe2/jobs/239913458.
Closes https://github.com/caffe2/caffe2/pull/757

Differential Revision: D5211808

Pulled By: Yangqing

fbshipit-source-id: f3ae83fb2933bad98eea2c02275fa41bf8fad892
2017-06-08 13:34:44 -07:00
75f1da327d Skip Python tests which require opencv or lmdb
Summary:
Neither dependency is required by the core Python modules.

OpenCV, in particular, is a pain to install (no pip package). Conditionally skipping this test will make TravisCI integration easier.
Closes https://github.com/caffe2/caffe2/pull/739

Differential Revision: D5211799

Pulled By: Yangqing

fbshipit-source-id: c6bdc8a17977f64f34e968fd9ab8c65161d2624d
2017-06-08 13:34:43 -07:00
49c89d6664 Use add_dependencies() for ExternalProjects
Summary:
I closed https://github.com/caffe2/caffe2/pull/736 because one of these variables should be used after all.

Here's how C1 uses this variable: https://github.com/BVLC/caffe/blob/rc5/cmake/Targets.cmake#L116

Without this fix, there is a race condition in the parallel build leading to this error:
```
make[2]: *** No rule to make target `../third_party/NNPACK/lib/libnnpack.a', needed by `caffe2/libCaffe2_CPU.so'.
```
Closes https://github.com/caffe2/caffe2/pull/737

Differential Revision: D5211794

Pulled By: Yangqing

fbshipit-source-id: 9e368f09b01edaf86252727adc6f6cc40d244e29
2017-06-08 13:34:42 -07:00
00e098083e Fixed thread safety issues in ImageInputOp
Summary:
The random number generators could be used in a thread-unsafe method.
This patch fixes this by adding a way for tasks to get the thread ID they are
running on.

Reviewed By: panshen1

Differential Revision: D5051334

fbshipit-source-id: 9a9f9e2e7b7a86ff456f37b40422af4fa100b5d9
2017-06-08 11:50:46 -07:00
fab5bef9f6 Merge pull request #45 from slayton58/nccl_cmake_fix
Fix NCCL directory typo
2017-06-08 11:28:25 -07:00
27e01744b2 Probably fixed memonger
Summary:
This diff fixes various issues with memonger, and works at leasrt with rbgirshick's failure case, Resnet-50, and new harder unit test. I will still create a proper resnet50-test.

1) Introduce concept of "tokens". These are passed down the dependency chains, and a blob can be used for recycling only if it owns all the tokens that are currently in possession. Tokens are added when branching, and tokens are redeemed after all inputs are satisfied. A bit hard to explain.
2) There were various bugs due to bad code: the free_blobs data structure is of different type when we have blob sizes and when we haven't. I plan to rewrite this soon. But there were some bugs.
3) Added a harder unit test that failed before.
4) Added test for resnet50 + memonger

Reviewed By: asaadaldien

Differential Revision: D5193393

fbshipit-source-id: bc2a714877aa1201c32a5ba8ade862865e455711
2017-06-08 09:19:24 -07:00
feba1eed00 resnet50: fetch right lr
Summary: I broke resnet50 when switching to use optimizer, which uses LR per parameter. This only happens after each epoch, and I did no test patiently enough. For a stop-gap, while asaadaldien works on a better solution, just fetch the lr of a conv1_w param.

Reviewed By: asaadaldien

Differential Revision: D5207552

fbshipit-source-id: f3474cd5eb0e291a59880e2834375491883fddfc
2017-06-07 21:46:35 -07:00
4fefff0bbb Auto injecting device copy for single net and several nets
Summary:
This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves.

Ideally, this feature will happen like this:

      //construct your nets first
      core.InjectDeviceCopyAmongNets([train_init, train_net, ...])

My ideas are written in comments. I will update them here as well later.

Reviewed By: dzhulgakov

Differential Revision: D5134103

fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5
2017-06-07 20:03:18 -07:00
21a5c8ea5e Fix use of nccl_INCLUDE_DIRS in nccl.cmake 2017-06-07 20:13:11 -04:00
87a12dd355 Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace.
Summary: Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Some of the output blobs (like mask output of DropOut when is_test=1) may be nullptr and FetchBlob will fail.

Differential Revision: D5198641

fbshipit-source-id: 45ee26c4cb1c25cc48904e9f7d7c007224c97418
2017-06-07 15:35:32 -07:00
4316fb4876 Implement APMeter op
Summary: Implements an APMeter operator (APMeterOp) to calculate AP for multilclass classification given prediction socres and labels. The Op takes a score tensor [nsamples x nclasses] and a label tensor [nsamples x nclasses], and outputs a float tensor of size nclasses as the AP for each class.

Reviewed By: akyrola

Differential Revision: D5082565

fbshipit-source-id: ae7304bc8fc999c361245b9aec38eb9a5f5eef4b
2017-06-07 15:03:04 -07:00
5300aafc1f Fix NCCL directory typo 2017-06-07 17:01:13 -04:00
ee3727db00 add_helper_function_ElementwiseLinear_op
Summary:
Add a helper function for parametric op ElementwiseLinear
The typical syntax is model.ElementwiseLinear(input, output, dimension)

Reviewed By: harouwu, akyrola

Differential Revision: D5114152

fbshipit-source-id: 8e8c691f824f518ae510a72ab0c12de1b018f3b5
2017-06-07 13:49:48 -07:00
77c481c40c Fixed flaky observerTest.TestNotifyAfterDetach
Summary: Not resetting counter matters when running binary directly.

Reviewed By: bwasti

Differential Revision: D5202723

fbshipit-source-id: 7cc5c9e4d5c6db0f79fa3950454556bc26ea4914
2017-06-07 13:33:08 -07:00
a9bd1de9e9 fixed README to reflect docker image name (#1751) 2017-06-07 15:49:39 -04:00
98825d1323 guard against special case of in-place operation
Summary:
There is an edge case where internal gradient blobs of the backward step net should not be considered internally calclulated if the only "internal" calculation is in-place.

In the case of the failing attention unit tests, the offending blob was attention_weighted_encoder_context_grad, which was incorrectly considered internal because it was the output (as well as input) of a Reshape on the step net's edge. The caveat here is that the results may be unpredictable if a non-pass-through in-place operation is applied to a blob within a step net which is also consumed both internally and is a recurrent state/output. (This is an extreme edge case, and difficult to explicitly enforce, but it's worth noting.)

Reviewed By: salexspb

Differential Revision: D5198328

fbshipit-source-id: 0cfa8f903fd767fc50e727f238ac3d8cdca03fe0
2017-06-07 12:33:31 -07:00
e57eef4bcb Merge commit '62835fc3f5346968b4dca392c77efdeb75a6b172' 2017-06-07 14:54:47 -04:00
d81da41650 Make sure the number of MKL and OpenMP threads match
Otherwise, on many machines, the size of the OpenMP thread pool will
change between MKL and our OpenMP enabled functions. The constant thread
creation and destruction results in worse performance and leaks memory
on GCC 5.4
2017-06-07 14:53:29 -04:00
62835fc3f5 Make sure the number of MKL and OpenMP threads match
Otherwise, on many machines, the size of the OpenMP thread pool will
change between MKL and our OpenMP enabled functions. The constant thread
creation and destruction results in worse performance and leaks memory
on GCC 5.4
2017-06-07 14:53:14 -04:00
da7957c660 Fix masked_copy call to masked_scatter. (#1749) 2017-06-07 12:58:47 -04:00
2a49353d5e minor fix for docs of Upsample 2017-06-07 11:42:52 -04:00
b05c23de44 Merge commit 'da45b4c6b3b0b7cd8f0dc612b9afa6a3a07b8305' 2017-06-07 11:31:38 -04:00
019e967113 Merge commit '47bf87b9220c10edaafec98c6bd20bdb1436c8e4' 2017-06-07 11:30:35 -04:00
b9ab26765e Add 3D upsampling (nearest and trilinear) with tests 2017-06-07 11:29:27 -04:00
da45b4c6b3 Add 3D upsampling (nearest and trilinear) with tests 2017-06-07 11:24:41 -04:00
47bf87b922 Add 3D upsampling (nearest and trilinear) with tests 2017-06-07 11:24:05 -04:00
edd41d8d80 BatchNorm fallback to THNN when eps < CUDNN_BN_MIN_EPSILON (#1742) 2017-06-07 09:56:28 -04:00
352f8b2fa6 Merge commit 'ced01f6c919c4b7109512ce797a2a0185c8f8112' 2017-06-07 09:22:14 -04:00
ced01f6c91 fix GRUFused signature 2017-06-07 09:21:20 -04:00
d524d5b481 Fixes zip/izip for Python 3
Summary: As title

Reviewed By: salexspb

Differential Revision: D5154186

fbshipit-source-id: 2ef24557d82ae16d3bdfbc90a4cc96be8e2dc6c3
2017-06-07 00:04:26 -07:00
60c78d6160 Fixes range/xrange for Python 3
Summary: As title

Differential Revision: D5151894

fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638
2017-06-07 00:04:26 -07:00
d351239c10 fix legacy ClassNLLCriterion for upstream change 2017-06-07 00:38:00 -04:00
1b1579c89d Merge commit 'b96f76e470b25454b6b14c7ace888686295405e9' 2017-06-07 00:19:42 -04:00
df7c47142d fix for THNN NLLLoss signature change 2017-06-07 00:18:11 -04:00
4c5d101caf Implement ColwiseMax and RowwiseMax reduction ops.
Differential Revision: D5192949

fbshipit-source-id: e7e877b4bea19dd1be94449d45d2733f4858b8e7
2017-06-06 21:17:29 -07:00
b96f76e470 standalone macros 2017-06-07 00:17:05 -04:00
7e62971c86 Merge commit '71ccedbc6c4e460d38c794737bba780e7673e888' 2017-06-06 23:38:52 -04:00
a7d987544d Merge commit '4e49aed5eaa5a4abaf0a51bb87a49b44394ea3c3' 2017-06-06 23:35:42 -04:00
4e49aed5ea fix outputHeight <-> outputWidth 2017-06-06 23:33:51 -04:00
71ccedbc6c Merge pull request #470 from qqning/master
Fix the mix-up of height and width on depth-wise convolution
2017-06-06 23:31:54 -04:00
c3cda260b6 Merge commit '64faf120acb97866dfd90bf428b385deee4ee912' 2017-06-06 23:27:45 -04:00
22949350b6 More performant fix for fused rnn kernels (#1532) and bugfix (#1721) 2017-06-06 23:25:31 -04:00
3f7b48ccda Remove clone in fused rnn 2017-06-06 23:20:14 -04:00
db620304b2 More performant fix for fused rnn kernels (#1532) and bugfix for #1721 2017-06-06 23:13:07 -04:00
d7db75c10f added CosineSimilarity to nn.distance and updated docs (#1672)
* added CosineSimilarity to nn.distance and updated docs
2017-06-06 22:53:21 -04:00
89894536c8 Fix VideoInputOp memory leak
Summary: VideoInputOp has memory leak

Differential Revision: D5193802

fbshipit-source-id: a48e309b845e84ec83875119646bbb6f926ac755
2017-06-06 16:08:45 -07:00
e50d599240 Fix header inclusion in math.h
Summary:
While debugging #43 I found common/common.h missing some headers as well.

Fixes #43.
Closes https://github.com/facebookincubator/gloo/pull/44

Differential Revision: D5194970

Pulled By: pietern

fbshipit-source-id: 4861cd04c56931d4759f5bc050816788252003ee
2017-06-06 15:21:08 -07:00
93ac6a9837 checkpointing for distributed hive reader.
Summary:
The goal of this diff is:
1) Enable checkpointing to honor batches_per_epoch
2) Resume hive_readers mid-split

Reviewed By: azzolini

Differential Revision: D5004212

fbshipit-source-id: 2ff5df30ba946eefadd109d80056cde67398a080
2017-06-06 14:20:06 -07:00
7723129d14 Add gradient for topK op
Summary:
Input of topK op: X (dense)
Output of topK op: Value and Indices (sparse representation)
Value will have gradient in some cases,

We backprop (copy) the gradient from sparse (d Value) to dense (d X)

Differential Revision: D5133461

fbshipit-source-id: 7bad55b60e8a22dfe0e51357ce2099d7f752c133
2017-06-06 14:20:06 -07:00
c9c862fa8f 16117716 [Caffe2 OSS] make char-rnn exapmle use build_sgd
Summary: replace hand made sgd with build_sgd

Reviewed By: salexspb

Differential Revision: D5186331

fbshipit-source-id: 3c7b4b370e29a1344b95819766463bae3812c9a6
2017-06-06 13:54:59 -07:00
2ec2d23f88 booleanmask support indices sorting
Summary: The booleanmask supports another output with sorted indices

Differential Revision: D4984255

fbshipit-source-id: becb10d7fe989bb2f6488c901766a45369613eb7
2017-06-06 13:32:51 -07:00
c6a6391c38 added checks to cudnn Convolution for stride, dilation, kernel size and num input planes (#1723)
* added checks to cudnn Convolution for stride, dilation, kernel size and num input planes
2017-06-06 15:42:00 -04:00
d50ad408fa fix incorrect grad_weight in Bilinear 2017-06-06 15:07:09 -04:00
73ccdb3920 Fixing the issue with incorrect normalized values in IndexLinear 2017-06-06 11:44:11 -07:00
b36d716614 Implemented a ObserverBase class for Tracing Graph performance.
Summary: Contains the ObserverBase class and some unittests.

Reviewed By: bwasti, pietern

Differential Revision: D5099367

fbshipit-source-id: fabde126d3281729dfc772d63dbf363e5d649319
2017-06-06 03:46:23 -07:00
80fe2e5caf Fix from_column_list
Summary: Previous implementation relied on the order of fields for some reason.

Reviewed By: azzolini

Differential Revision: D5164478

fbshipit-source-id: 12717310860584e18ce4ca67d0bd5048354cdc0a
2017-06-06 01:17:02 -07:00
8cd208ad6f Infer input and output device from OperatorDef through OperatorSchema
Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution.

Reviewed By: akyrola, dzhulgakov

Differential Revision: D5161065

fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135
2017-06-05 23:47:33 -07:00
a38cae76ab benchmark compactible with lateest building process
Summary: update the new sigmoid calling process

Reviewed By: dzhulgakov

Differential Revision: D5187589

fbshipit-source-id: cf29e7e80776ac1c4cf5718c5d6043d44f62d4de
2017-06-05 23:47:32 -07:00
a5fc70857c Support fetching of the parameters from the global namescope by ''
Summary:
This diff is fixing fetching of the parameters in the global namescope. Earlier
diff that have switched to '' have introduced this bug.

Reviewed By: dzhulgakov

Differential Revision: D5189667

fbshipit-source-id: 4818e99e2c2c90788e70e0b8b6204ec6f471d37d
2017-06-05 22:32:39 -07:00
b6c75c43c8 add tests for checking the type of .data and .grad.data is the same 2017-06-06 01:06:14 -04:00
a53cde09b5 Rename masked_copy_ to masked_scatter_ 2017-06-06 01:06:14 -04:00
98afdcf409 Accept None values returned from grad hooks 2017-06-06 01:06:14 -04:00
ef32e96447 Fix grad type of compare functions 2017-06-06 01:06:14 -04:00
b032b88f34 Fix Prod backward and autograd tests 2017-06-06 01:06:14 -04:00
a76098ac15 fix optimizer when given single parameters (instead of an iterable)
When I use the named_parametes to modify the lr and weight decay, I will face a bug. Because the value of the named_parameters return is  torch.nn.paramter.Parameter, not a generator of the Parameter.
2017-06-05 23:47:56 -04:00
2ce5875a4d Modify the sample code of extending autograd (#1720)
The original input can not be used as input of Linear(), because forward() takes at least 3 arguments (2 given)
2017-06-05 23:36:58 -04:00
511cb20e7d Add Gesv to autograd (#1733)
* Add Gesv to autograd

* Add TODO for backprop through LU
2017-06-05 21:38:49 -04:00
686470a6b8 Feature importance in dper 2.0: build network representation
Summary: Changes to enable feature importance.

Reviewed By: kennyhorror

Differential Revision: D5075252

fbshipit-source-id: e5d46e129bcd5cbef77932c63b5a288dd57775d1
2017-06-05 18:03:34 -07:00
ebecafbcca Support for position weighted in distributed PS
Summary: Title

Reviewed By: azzolini

Differential Revision: D5081871

fbshipit-source-id: 68a97c2112522fbcbcdfd9e0f717b8bce60fe028
2017-06-05 17:04:42 -07:00
5447f5c0d7 Move position weighted to separate layer
Reviewed By: kennyhorror

Differential Revision: D5063086

fbshipit-source-id: 212c08946728437bcc8b6049438ae82235137ec6
2017-06-05 15:49:22 -07:00
f1c971d04b add ExpandDims to _known_working_ops
Summary: ExpandDims is a trivial utility op which should not be triggering a warning when used by ModelHelper.

Reviewed By: akyrola

Differential Revision: D5117985

fbshipit-source-id: 5589f46f58458f5019924b48602db088563f2fee
2017-06-05 15:49:21 -07:00
5e6bd4fbfc Return predict params from ExtractPredictorNet + test
Summary:
Make it easier for users by returning from ExtractPredictorNet the list of blobs that must be saved/exported to run a predictor net. Added a test for ExtractPredictorNet

Codemod.

Reviewed By: asaadaldien

Differential Revision: D5176097

fbshipit-source-id: b1af42132459487b8d94fcdde0e4c514da608243
2017-06-05 15:34:37 -07:00
2a93470238 dont use Swap for param gradients but accumulate directly to correct grad blob
Summary:
Swap for accumulated gradients causes problems with distributed training as Gloo ops expect the buffers (pointers) to remain the same. Also, it is quite a hack. So after talking with salexspb, this diff changes the parameter gradient by "transposing" it:
  - gradient ops are rewritten to write into a blob with name grad + "_tmpstep"
  - then that blob is accumulated directly to the actual gradient blob, not a temporary "_acc" blob.

Reviewed By: salexspb

Differential Revision: D5184839

fbshipit-source-id: c7ca445d4077ff90413c358bb0f7199d123a5553
2017-06-05 15:07:39 -07:00
df2f52704c Another fix for KeepOnShrink tests
Summary:
*Fix #417 again (#551 was insufficient)*

Even after a reallocation, the data address can still be the same if malloc returns the same newly freed address.

* Be very explicit and careful about how we set these flags so they don't interfere with other tests
* Disable the failing check

This somewhat takes the teeth out of this test, since it no longer verifies that the reallocation actually occurs.

Test with:
```
blob_test --gtest_filter=TensorCPUTest*Shrink* \
    --gtest_shuffle --gtest_repeat=100 --gtest_throw_on_failure
```
/cc sunwael
Closes https://github.com/caffe2/caffe2/pull/723

Differential Revision: D5174953

Pulled By: akyrola

fbshipit-source-id: 3d875a52c8139e73db85550817dea3c837eb7eae
2017-06-05 14:47:16 -07:00
e3305eb9dc Runtime dockerfile (#1732)
* reduce the size of Docker image

* add runtime dockerfile
2017-06-05 17:40:06 -04:00
e9bf702c5e LSTM bias_hh, fix docs
Rename W_hi ... to b_hi ...
2017-06-05 22:55:09 +02:00
9a2d11dd36 Use a longer timeout when establing initial tcp connection
Summary: Machines may not create their Gloo pairs at the same time, due to earlier variable time work. Increase the timeout used to establish the initial tcp connection to accommodate without sacrificing the shorter default timeout for outstanding reads/writes. No related change required for ibverbs as there is no communication on init.

Reviewed By: akyrola

Differential Revision: D5184518

fbshipit-source-id: 0e6c9704a2d2f1406b3927f75887f0a42199450b
2017-06-05 13:40:22 -07:00
8e99824ce7 Allow subsets of gradient outputs / inputs in Python ops
Summary:
I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op.

I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed.

Reviewed By: dzhulgakov

Differential Revision: D4897004

fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd
2017-06-05 12:52:01 -07:00
8871ef029b quick fix future issue with brew/core/schema/workspace/scope/utils.py
Summary:
fixing missing future package issue.

Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import

Reviewed By: Yangqing

Differential Revision: D5183547

fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82
2017-06-05 12:01:48 -07:00
77c1027abb Create ParameterSharing abstraction for Caffe2.
Summary:
This diff is introducing abstractions for parameter sharing for all the
parameters, that are created through new create_param syntax.

Possible use-cases of this parameters sharing:
1. Share params within RNN interface.
2. Some complicated models that might share some of the branches.
3. TODO (next diff): Cross-model parameter sharing.

Reviewed By: salexspb

Differential Revision: D5160935

fbshipit-source-id: c6d40a5ed7ead240cd7db0eb69de6dc5f505b05a
2017-06-05 11:49:54 -07:00
3716286e6b reduce the size of Docker image (#1729) 2017-06-05 14:03:11 -04:00
112561bcd4 Hide loud warning when using to third_party eigen
Summary:
This is a little excessive:
```
CMake Warning at cmake/Dependencies.cmake:201 (find_package):
  By not providing "FindEigen3.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Eigen3", but
  CMake did not find one.

  Could not find a package configuration file provided by "Eigen3" with any
  of the following names:

    Eigen3Config.cmake
    eigen3-config.cmake

  Add the installation prefix of "Eigen3" to CMAKE_PREFIX_PATH or set
  "Eigen3_DIR" to a directory containing one of the above files.  If "Eigen3"
  provides a separate development package or SDK, be sure it has been
  installed.
Call Stack (most recent call first):
  CMakeLists.txt:72 (include)
```
Closes https://github.com/caffe2/caffe2/pull/729

Differential Revision: D5183059

Pulled By: Yangqing

fbshipit-source-id: d17d5d06a50abb50f9978d022ddc4918e991079d
2017-06-05 10:33:27 -07:00
c357ebd590 Merge commit '6422ea3d9f065683bb899b88ae0baec79e6d73ca' 2017-06-05 13:01:25 -04:00
85a95d8a23 Fix sharing of CUDA tensors on non-current devices
The correct device must be set when getting the base allocation and when
calling cudaIpcCloseMemHandle. Store the device in the allocators
context, which was previously always NULL.

Fixes #1707
2017-06-05 13:01:19 -04:00
6422ea3d9f Fix sharing of CUDA tensors on non-current devices 2017-06-05 12:58:34 -04:00
ddf6328990 Document type function returns type with no args (#1719) 2017-06-05 11:54:55 -04:00
174c3cc399 Add support for double backward of LeakyReLU (#1714) 2017-06-05 11:53:27 -04:00
24aecaa2c8 Cleanup torch vision docs (#1699)
* Modify torchvision documentation following https://github.com/pytorch/vision/pull/179

* Add new datasets to docs

* Fix wording in torch.datasets

* Small clarification
2017-06-05 11:52:41 -04:00
4e33aee349 remove stray code from CUDNN ConvTransposeGradient that caused a memory allocation
Summary:
KaimingHe noticed a curious performance problem with ConvTranspose (actually ConvTransposeGradient): it got slower when more GPUs were used! This did not make sense.

After some strenuous debugging, I noticed that tensor Y = Output(0) was being reallocated every time: this causes the slowdown because we grab a mutex for each allocation.

Turns out this Y variable is copy-paste code and actually not intended to be part of the gradient op. This caused reallocation because the computed size of Y was larger than dfilter's (also Output(0)), but we never set the capacity of Y/dfilter to match the capacity of the larger size. Thus, Tensor.Resize() always ended up reseting the tensor --> allocation. This did not affect correctness of the code, but made it super-slow.

Before on KaimingHe's code ConvTransposeGradient took total of 3800 ms, now about 200ms.

Reviewed By: ajtulloch

Differential Revision: D5180280

fbshipit-source-id: d72f23038f0c51d82bcde7aed55089d657bda03e
2017-06-04 06:46:35 -07:00
4853cc0194 convert linalg.py to new-style functions (#1638) 2017-06-04 09:27:01 -04:00
ac1c674723 Fix a couple of selection reduce function autograd bugs (#1702)
* Fix Median/Mode autograd functions.

* Fix kthvalue autograd function.

* Double backward for selection reduce functions.
2017-06-03 02:12:15 -04:00
705a8fb1b2 minor modify video_input_op
Summary: simply allows to access the third protos only when temporal jittering option is off

Differential Revision: D5178943

fbshipit-source-id: 027234abee5c5c9fcf624dcbd55eb10ae8c9314f
2017-06-02 22:46:56 -07:00
e05173a476 Create ExternalInitializer to simplify logic around init_params = False
Summary:
This diff is creating new type of Initializer - ExternalInitializer. This
initializer is supposed to be used in cases when the parameter blob is already
expected to exist in the workspace.

Reviewed By: dzhulgakov

Differential Revision: D5171322

fbshipit-source-id: d27861f0f80afdea93c235d49f63da19adccc92c
2017-06-02 18:22:50 -07:00
eba3dc8561 Fix gc_refs assertion failure (#1705)
* Fix gc_refs assertion failure

Ensure that each THPVariable -> THPFunction reference contributes one
ref count to the THPFunction by creating a new shared_ptr for each ref.

Because multiple shared_ptrs can again manage a single THPFunction, it's
not safe to use std::weak_ptr where it may point to a PyFunction. It's
still safe to use weak_ptr for grad_accumulator since these are never
PyFunctions.

Fixes #1626

* Remove stale comment
2017-06-02 21:08:50 -04:00
a8fb85797c Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params.
Summary:
This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences).

Renaming computed_params to non-trainable/non-backprop params should be done is some other diff.

Reviewed By: salexspb

Differential Revision: D5171159

fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832
2017-06-02 17:17:57 -07:00
3bd6195891 removed Sum from simple_operator_layers.py; passed unit tests
Summary: removed softmax, sigmoid, tanh, relu from simple_operator_layers.py; passed all unit tests

Reviewed By: kittipatv

Differential Revision: D5150271

fbshipit-source-id: abe611bf6c5de5caba189181e9e41d705d8c5c54
2017-06-02 15:03:16 -07:00
ee9d4d58e2 Fix connect bug
Before the change, processes were not waiting for master even when they got
'connection refused' (master is not listening yet, so we should wait).
It was because we were closing socket twice: first, by
the resource guard; second, manually in exception handler.
That caused errno to be set to different value (9 - bad file descriptor)
and in result `if`, which checked if connection was refused, was failing.
2017-06-02 23:42:11 +02:00
b7c4900d19 Fix minor bug in InitMethodFile 2017-06-02 23:42:11 +02:00
e22f9036de Add tcp init method for non-multicast addresses 2017-06-02 23:42:11 +02:00
c01ff1f3dc Make world_size mandatory for Master and Worker; Minor refactor 2017-06-02 23:42:11 +02:00
eeb8e5c31b Linux fixes 2017-06-02 23:42:11 +02:00
c6c9e61169 Implement THD tensor copies 2017-06-02 23:42:11 +02:00
34804e9600 Refactor file and tcp init methods
* Add sanity checks
 * Refactor InitMethodFile and TCPInitMethod to more logical functions
 * Update few error messages
 * Add passing parameters by **kwargs, so now order of parameters is not relevant
 * Review comments
2017-06-02 23:42:11 +02:00
c41555fb0a Add rank parameter; Fix MW mode initalization 2017-06-02 23:42:11 +02:00
96cc1e1ac7 Review comments 2017-06-02 23:42:11 +02:00
cfdd49f76a Simplify and refactor init code 2017-06-02 23:42:11 +02:00
447d9287bf Refactor multicast and change env init method 2017-06-02 23:42:11 +02:00
832eaf900b Fix bugs and improve init methods 2017-06-02 23:42:11 +02:00
e685277299 Add address discovery; Bug fixes; 2017-06-02 23:42:11 +02:00
8ea7c87c29 Improve init methods 2017-06-02 23:42:11 +02:00
09c0d9c51c Add multiple initalization methods for DataChannels 2017-06-02 23:42:11 +02:00
240384605c Make copy functions thread safe (#82) 2017-06-02 23:42:11 +02:00
9f9a3d596f Use lock_guard and don't use unique_ptr 2017-06-02 23:42:11 +02:00
a8c26c1040 Add mutexes to MasterCommandChannel::sendMessage 2017-06-02 23:42:11 +02:00
6cdfe0d7b9 Remove MASTER_ADDR and _PORT from MPI benchmarking 2017-06-02 23:42:11 +02:00
1b66b50064 Benchmarks: Don't export WORLD_SIZE when using MPI
I just realized we don't need it (any longer?).
2017-06-02 23:42:11 +02:00
cf42c1a044 Improve error messages of DataChannel::newChannel 2017-06-02 23:42:11 +02:00
f717f29d7e Change function names; Change thpp::Tensor to THDTensorDescriptor 2017-06-02 23:42:11 +02:00
181d2f41bd Add initial Python wrappers for THDTensors 2017-06-02 23:42:11 +02:00
2059ece284 Exit workers gracefully in master-worker mode 2017-06-02 23:42:11 +02:00
b3e100b40e Add copy (TH <-> THD) functions to MW mode 2017-06-02 23:42:11 +02:00
401908d570 add_weight_decay + restore weight decay to resnet50_trainer
Summary:
Add add_weight_decay to optimizer + test.

In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it.

Reviewed By: asaadaldien

Differential Revision: D5173594

fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531
2017-06-02 14:16:56 -07:00
398379db68 fixing lint errors in image_input_op
Summary: noticed a few lint errors in image_input_op so cleaned them up

Reviewed By: akyrola

Differential Revision: D5152171

fbshipit-source-id: f84f476ddace6b4164607a01a9780a2e57e2133f
2017-06-02 14:03:01 -07:00
a2ba169354 fixed operators schema output to work from only this file for OSS
Summary: old diff had some changes to formatter.py and generator.py, but now everything is in github.py

Reviewed By: bwasti

Differential Revision: D5165061

fbshipit-source-id: 5fe5ff70ff2c5525c7aacf20854916c86d272749
2017-06-02 13:47:25 -07:00
ec2de16776 Improve README copyediting 2017-06-02 21:02:14 +02:00
ea05d6aec3 Fix compilation with cuDNN 5 (#1703) 2017-06-02 14:03:02 -04:00
5a93d6b903 Fix CUDA_HOME detection (#1675) 2017-06-02 19:26:00 +02:00
75e0df271a Add Inverse to autograd (#1670)
* Add Inverse to autograd

* Add SkipTest to autograd tests
2017-06-02 12:00:13 -04:00
565bf7116b A pile of misc doc fixes. (#1682)
* A pile of misc doc fixes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Handle @apaszke  review comments.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Initial csrc documentation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-02 11:59:03 -04:00
f1c57ace1b added input dim checks to convxD and conv_transposedxd (#1695)
* add input dim check for conv2d

* add None check to conv2d

* added input dim checks to convxD and conv_transposedxd

* flake8 fixes
2017-06-02 11:58:19 -04:00
460b8715a8 display version number in docs 2017-06-02 11:56:48 -04:00
6da111c53d Merge commit '00843c57c936720b3d17f4c0afaab08dcb52a7cc' 2017-06-02 11:52:19 -04:00
568c5c91ee substitute cudnnFind* functions with cudnnFind*Ex 2017-06-02 11:52:12 -04:00
00843c57c9 substitute cudnnFind* functions with cudnnFind*Ex 2017-06-02 11:50:50 -04:00
501467db17 added param name to tuple_parser for better error messages 2017-06-02 16:16:21 +02:00
4bed0c6d41 Update RNN Seq2SeqModelCaffe2EnsembleDecoder to reflect training network structure
Summary: Use new blob as residual sum output, and add scoping to prevent any name conflicts.

Reviewed By: urikz

Differential Revision: D5167145

fbshipit-source-id: a01c87ed2278205e95e8395314b166afb1dca1b3
2017-06-01 23:32:35 -07:00
55ada6d64e Fix padding params check for conv-cudnn.
Reviewed By: dutran

Differential Revision: D5169744

fbshipit-source-id: 3d1c50328eefed01fb9d4daa84478c45cd0aa5fd
2017-06-01 22:38:06 -07:00
b3e179ea31 fixing lmdb.cc when compiled on Windows (mkdir -> _mkdir)
Summary:
Should fix #462 .
Closes https://github.com/caffe2/caffe2/pull/539

Reviewed By: asaadaldien, dzhulgakov

Differential Revision: D5162615

Pulled By: Yangqing

fbshipit-source-id: 985d3694e389bcf1fd96990254a53d806baba0cb
2017-06-01 21:48:25 -07:00
2c97c98ca7 Enable testing the GPU implementations of Adagrad and Adam
Summary:
Enable testing the GPU implementations of Adagrad and Adam incl sparse versions.
Closes https://github.com/caffe2/caffe2/pull/607

Reviewed By: dzhulgakov

Differential Revision: D5121552

Pulled By: Yangqing

fbshipit-source-id: da6b7dde456237c94cf74d00860e7327b2267eab
2017-06-01 18:10:57 -07:00
fc4d118e6b Caffe2 MemNN Production Model Saving
Summary:
Split the Caffe2 memory based model into to parts
- Dimension reduction MLP
- DNN with concatenation of memory and obj feature

Currently only implement simple mean

Differential Revision: D4866825

fbshipit-source-id: d2f6813402513ec9af30dbe29a50593e2d3cdb3b
2017-06-01 14:31:53 -07:00
299f293cb2 Add initializer classes to conv_nd.
Summary: Fix parameters passed to _ConvBase

Reviewed By: sunwael

Differential Revision: D5166836

fbshipit-source-id: 6c2a9fa73cf1199a5f861900554f3075a49104fc
2017-06-01 14:17:55 -07:00
05e060974f added events and user group info
Summary:
also contains previous edits on statuses which should be in here....
Closes https://github.com/caffe2/caffe2/pull/657

Differential Revision: D5158733

Pulled By: aaronmarkham

fbshipit-source-id: faba2ab8e2dab206e09f57021b973b3e7d01af95
2017-06-01 09:35:26 -07:00
58874ad5bf Fp16 training initializers
Summary:
Re-open for re-importing :)
Closes https://github.com/caffe2/caffe2/pull/721

Differential Revision: D5164345

Pulled By: akyrola

fbshipit-source-id: e80b32556cd25610602df91a4225b93edc0ca40b
2017-06-01 08:34:46 -07:00
d51cd61e2e add checks for input, weight and bias types when using cudnn conv2d (#1689) 2017-06-01 10:06:30 -04:00
447fe953e5 Modify the sample code of volatile (#1694)
The original two inputs (torch.randn(5,5)) can not be used as input of resnet, which must be (batch, channels, width, height)
2017-06-01 09:46:04 -04:00
ffbba0fae7 add model_helper Validate() + sprinkler around
Summary:
Recent diff introduced a duplicate parameter to the model, which would hurt the performance and also affect correctness (duplicate momentum updates, for example). We unfortunately had no checks for duplicate params, outside of data_parallel_model, which fortunately brought this into our attention.

But it is better to have a Validate() function in model_helper, and call that before adding gradient ops and querying for parameters. Added to brew_test calls as well.

Reviewed By: kennyhorror

Differential Revision: D5163458

fbshipit-source-id: 35692e8bfcc359d4e8bc73e6f2358659f6e45ceb
2017-06-01 02:36:47 -07:00
0f8c8f37a8 Revert D5159712: [caffe2][PR] Fp16 training initializers
Summary: This reverts commit 60a889494d2e2f4df1d720331e19f638c5eb95cc

Differential Revision: D5159712

fbshipit-source-id: 16040c911b260648857f656f92b165f92c2daae0
2017-06-01 00:17:14 -07:00
076376f4f6 Revert D5119830: [C2] Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params
Summary: This reverts commit 2001090a37346eb12abbb234e13e727c288eb8a7

Differential Revision: D5119830

fbshipit-source-id: bf321868338f0db85dff3237af7eaf74212dbdf6
2017-06-01 00:02:21 -07:00
ff61ed358e Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params
Summary:
This diff is the first step in the effort for refactoring all paramters. As a
first step - I'm merging concept of params and computed_params, that is going
to be based on tags instead (in the first version it's still using old data
structs to store all the BlobReferences).

Renaming computed_params to non-trainable/non-backprop params should be done is
some other diff.

Reviewed By: salexspb

Differential Revision: D5119830

fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7
2017-05-31 22:36:36 -07:00
7c3add4408 better android ndk path
Summary:
use user defined android ndk path instead of hard code.
Closes https://github.com/caffe2/caffe2/pull/506

Differential Revision: D5162646

Pulled By: Yangqing

fbshipit-source-id: 5093888e15607b3bf6682e05eb91aa94c6206b01
2017-05-31 20:35:23 -07:00
d8d1cd1064 Test smaller tensors in segment_ops_test
Summary:
It's causing problems inside docker containers:

`InvalidArgument: Insufficient bytes of entropy to draw requested array.  shape=(5, 9, 10, 5), dtype=float32.  Can you reduce the size or dimensions of the array?  What about using a smaller dtype? If slow test runs and minimisation are acceptable, you  could increase settings().buffer_size from 8192 to at least 18432000.`
Closes https://github.com/caffe2/caffe2/pull/707

Differential Revision: D5162621

Pulled By: Yangqing

fbshipit-source-id: 55544210961cbc80828dca2cbeba6a5ace8cf8d1
2017-05-31 20:17:31 -07:00
e2cf007dc8 Avoid numpy VisibleDeprecationWarning in test
Summary:
This warning becomes an error with https://github.com/numpy/numpy/pull/6271 (`>=0.12.0`).

```
caffe2/python/operator_test/tile_op_test.py::TestTile::test_tilewinput
  /opt/caffe2/caffe2/python/operator_test/tile_op_test.py💯 VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
    dims[axis] = tiles
  /usr/lib/python2.7/dist-packages/numpy/lib/shape_base.py:873: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
    return c.reshape(shape_out)
```
Closes https://github.com/caffe2/caffe2/pull/710

Differential Revision: D5160776

Pulled By: Yangqing

fbshipit-source-id: b264e0e389de5817a289db878c15e655f9fa2f09
2017-05-31 20:01:30 -07:00
7b5af7d1b7 Expand ibverbs read timeout messages
Summary: TSIA

Reviewed By: romain-intel

Differential Revision: D5158642

fbshipit-source-id: 6e55a69a140c1f5f6e4ce6262afaf5014c412414
2017-05-31 19:50:21 -07:00
4da9e92d3f MPIConstantFill -> ConstantFill
Summary:
Continuation of https://github.com/caffe2/caffe2/pull/709

Close https://github.com/caffe2/caffe2/issues/706

/cc Yangqing
Closes https://github.com/caffe2/caffe2/pull/711

Differential Revision: D5162486

Pulled By: Yangqing

fbshipit-source-id: 3ff069aa27eecf73c3dc51eacf86a6974f027625
2017-05-31 19:47:49 -07:00
2bfacff426 Fp16 training initializers
Summary:
Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training.
Closes https://github.com/caffe2/caffe2/pull/697

Differential Revision: D5159712

Pulled By: salexspb

fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc
2017-05-31 17:46:58 -07:00
1740f90821 disable appveyor for cuda for now due to out of time error
Summary: Closes https://github.com/caffe2/caffe2/pull/708

Differential Revision: D5158662

Pulled By: Yangqing

fbshipit-source-id: cdff7e79c6d91c867a9b339525bc67e222b3b28d
2017-05-31 16:48:09 -07:00
680a00e99a MPIConstantFill -> ConstantFill
Summary:
(this is due to an earlier blind vim find-replace error)
Closes https://github.com/caffe2/caffe2/pull/709

Differential Revision: D5159055

Pulled By: Yangqing

fbshipit-source-id: f188b7bebf79a45825568ba96a71b535fe4e3aad
2017-05-31 16:36:49 -07:00
f0f4c2fc5d Increase the number of DAG execution worker threads.
Reviewed By: akyrola

Differential Revision: D5158414

fbshipit-source-id: add377aec5588076db881a2a3750101710f29732
2017-05-31 15:19:19 -07:00
73a8a49c7e synchronize re-rendezvousing on node changes + support num_shards=1 rendezvous
Summary:
Currently we can get into broken situations when some nodes working on computation detectChanges() faster than others, thus only some of the nodes start doing next iteration of training. This is an inconsistent state. To prevent this to happen, now each node sets a "re-rendezvous flag" and that is allreduced after each iteration. Once all agnodes agree, re-rendezvous will be done.

Also noticed that min_shards=1 does not work because data parallel model assumed num_shards>1 when rendezvous is not None. Fixed that.

Reviewed By: andrewwdye

Differential Revision: D5156282

fbshipit-source-id: f2ccbd8ad13ed37f7813ff8ad1080d963d0d17e3
2017-05-31 15:19:13 -07:00
72ea177188 Add target for quick build+test
Summary:
Once the build is cached, QUICKTEST takes less than 3 minutes to install+build+test (first build is ~13 minutes).

Future TravisCI improvements:
* Refactor other build targets so they're fast enough to build in under 45 mins
* Run tests for other build targets
* Run Python tests
Closes https://github.com/caffe2/caffe2/pull/550

Differential Revision: D5157407

Pulled By: Yangqing

fbshipit-source-id: b2b2d9c2c85423cc78f314951da54b64c247c0af
2017-05-31 13:51:53 -07:00
f0795c15a4 Disable stacktrace on fatal signal by default
Summary:
This PR adds a cli flag '--caffe2_print_stacktraces' that takes a bool, that, when set, will print stack traces when a fatal signal occurs. As a side effect a few new APIs are introduced `caffe2::setPrintStackTracesOnFatalSignal` and `caffe2::printStackTracesOnFatalSignal` - however these are mostly exposed for testing infrastructure purposes.

Also it appears at some point fatal signal handlers were strictly disabled for android - this PR re-enables them.
Closes https://github.com/caffe2/caffe2/pull/698

Reviewed By: Yangqing

Differential Revision: D5150001

Pulled By: danzimm

fbshipit-source-id: abb4aada4ddae8bcfbf1a85f3d101ed63692f221
2017-05-31 12:54:04 -07:00
afc26ac675 Added time-out to ibverbs transport
Summary: Extended the time-out option from just working on TCP to also working with ibverbs

Reviewed By: pietern

Differential Revision: D5090258

fbshipit-source-id: fee685850d761d0c2130852f513c64ceb19f4e9e
2017-05-31 11:20:40 -07:00
f2d9d97008 Add an option to reset momentum-sgd params every time between successive block updates.
Reviewed By: akyrola

Differential Revision: D5149263

fbshipit-source-id: c0a3637a1b48f74ec55c9d13c8fab3456dab809c
2017-05-31 00:32:11 -07:00
ccdf2d99e1 Add description to assert in model_helper
Summary: Add information about the offending param when assertion fires.

Reviewed By: kennyhorror

Differential Revision: D5153625

fbshipit-source-id: 9f5a02bf64ccbdef9d93d346f79e589dfe3ec5be
2017-05-31 00:02:18 -07:00
c344880373 add automatic timing of parameter update phase
Summary:
Add timing of the phase between last gradient op and the final sync. This gives approximate measure of the latency of distributed computation and helps detecting stragglers. Not intended as a real measure but just for relative comparison.

This could be improved by making nodes share their timings and make decisions based on it. But for first step, we can just look at the numbers ourselves.

Reviewed By: andrewwdye

Differential Revision: D5149273

fbshipit-source-id: c4c346291c0feb6e9c6ceced64e7be667d17dcad
2017-05-30 20:47:18 -07:00
ce7ce46ca1 fix secondary device check by gradient, if it is sparse
Summary: Fix an issue where the parameter is not created in param_init_net, or net, and then we secondarily look at which device op outputs the gradient. This did not work if the gradient was a GradientSlice.

Reviewed By: harouwu

Differential Revision: D5153102

fbshipit-source-id: 20eae660ea32e5a9ea484bf93c04c8f8c71a51ed
2017-05-30 20:47:17 -07:00
96d8ae2163 Make fills work with input_shape when run in CUDAContext
Summary: If ConstantFill (or other fill op) is used in CUDAContext, with input_as_shape, the code crashes as it expects the shape be in CUDAContext but accesses the array in host code... We could fix this by copying the values from the CUDA tensor, but it is probably best to enforce the shape param is in CPU context. This is what this diff does.

Differential Revision: D5152766

fbshipit-source-id: 0629a189bd1d800c0b7c9dbc324b78d279efac0b
2017-05-30 20:47:16 -07:00
846240a340 Caffe2 gradient generator bug fix
Summary:
Bug repro is in a test. Generally speaking accumulation was
not happening if len(ys) >= 2 (list of blobs we compute gradients
from) and for some blob in the net it was both in ys list and also got
a gradient propagated from another element in ys.

Reviewed By: akyrola

Differential Revision: D5121695

fbshipit-source-id: 282d88f2f4f6e27dadae311964f40246a2739130
2017-05-30 18:47:08 -07:00
6f791e74f1 Add a minimum iteration count of 1 for benchmarks
Summary:
For some long running benchmarks, the iteration count could be 0
which would lead to a segfault when printing results

Reviewed By: pietern

Differential Revision: D5149034

fbshipit-source-id: 7b56e8961c302d1ff11ffcd74ca8e909ea046231
2017-05-30 18:12:39 -07:00
aa59b217a9 Relax requirement on the outputs of the predictor.
Summary: It looks like it's a bit too restrictive requirement. Let's remove it.

Reviewed By: volkhin

Differential Revision: D5150968

fbshipit-source-id: 9e38574edc6542c5ce3c7f25a01afe8f5ff9b507
2017-05-30 17:23:18 -07:00
1aa6300696 Option to use NCCL for broadcast
Summary:
Fixes some performance issues when `broadcast_computed_params=True` is passed to Parallelize_GPU. Enabled via the same `use_nccl` flag as AllReduce
Closes https://github.com/caffe2/caffe2/pull/630

Differential Revision: D5149828

Pulled By: akyrola

fbshipit-source-id: 12c9714c7fa078811f1cde61c8523dca8f7f968f
2017-05-30 16:46:38 -07:00
47e921ba49 Remove map() and filter() in favor of comprehensions
Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand

Reviewed By: akyrola

Differential Revision: D5142049

fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc
2017-05-30 15:32:58 -07:00
3106423713 Synchronize with H2D copyAsync before signalling the broadcast sender
Summary: Closes https://github.com/facebookincubator/gloo/pull/41

Differential Revision: D5149996

Pulled By: pietern

fbshipit-source-id: 15d61fab9babfeb1e4178b84ecf5f6e32ad3bfb3
2017-05-30 14:20:29 -07:00
0deec5b3b7 Add FLOP annotation functions to operator schema
Summary: Basic FLOP annotation functionality added to operator schema.

Reviewed By: dzhulgakov

Differential Revision: D5114086

fbshipit-source-id: 8a15d45dee744fbdceaed3773d70fb69a5cf0d24
2017-05-30 14:17:32 -07:00
acb2ad12e5 fix race condition at terminate
Summary:
Looking at one segfault at exit (https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=911625597&smc=chronos_gp_admin_client&log_type=stderr&offset=0&pretty_logs=false) and it's coredump, only thing I can see that a FreeBlob() operator is called concurrently while a cudaMemcpyAsync (on thread 1) is crashing. FreeBlobOp is only called at data_workers _stop() (via utils.ResetBlobs()), and only code that could run a cudaMemcpyAsync that time is the fetcher -thread of data_workers that is enquing blobs.

Here are the stacks: P57455299

This is clearly a bug since we should only clear the scratch blobs after all threads are terminated, which happens at wait_for_finish().

I am not 100% sure this fixes all the segfaults, but at least this one was most likely caused by this.

Reviewed By: andrewwdye

Differential Revision: D5146278

fbshipit-source-id: ae00796706bfc4fee6823caf6529b62ab20c1cd3
2017-05-30 13:47:10 -07:00
f6853d13df always use halving-doubling allreduce algorithm
Summary: Ring-chunked performance on 8 nodes was substantially worse than halving-doubling in some cases. We can just use halving-doubling in all cases.

Reviewed By: prigoyal

Differential Revision: D5148755

fbshipit-source-id: 1332065615be6b9faf873effac87056011e0e804
2017-05-30 13:16:46 -07:00
cdb50fbf2b add optimizer support to data_parallel_model; Use MomentumSGDUpdate
Summary:
This diff does two things:
- add supports for optimizer to data_parallel_model. User can supply optimizer_builder_fun instead of param_update_builder_fun. The latter is called for each GPU separately with proper namescope and devicescope, while optimizer builder only is called once and adds optimizes to the whole model.

- use MomentumSGDUpdate instead of MomentumSGD + WeightedSum. This bring major perf benefits.

Changes resnet50 trainer to use optimizer.

This relies on D5133652

Reviewed By: dzhulgakov

Differential Revision: D5142973

fbshipit-source-id: 98e1114f5fae6c657314b3296841ae2dad0dc0e2
2017-05-30 12:49:57 -07:00
0a9684c3b9 Mark in-place GPU dropout as broken, add test
Summary:
I'll let y'all decide how you want to fix this (probably need a persistent curand buffer). Here's a test to verify the fix.
Closes https://github.com/caffe2/caffe2/pull/495

Differential Revision: D5148815

Pulled By: akyrola

fbshipit-source-id: e80dabe65230ddd32340f2d872cd8786ac960bf8
2017-05-30 12:35:22 -07:00
44257ea5ed automatically infer device scope for param
Summary:
hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net.

Added a test as well.

Reviewed By: salexspb

Differential Revision: D5133652

fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f
2017-05-30 12:02:19 -07:00
6b1cf26380 Fix for dpm when GPUs don't have p2p access
Summary:
See discussion at https://github.com/caffe2/caffe2/pull/633#issuecomment-303536902

Tested with a TitanX (Pascal) and a TitanZ (Kepler) with this access pattern.
```
Checking GPU(s) for support of peer to peer memory access...
> Peer access from TITAN X (Pascal) (GPU0) -> GeForce GTX TITAN Z (GPU1) : No
> Peer access from TITAN X (Pascal) (GPU0) -> GeForce GTX TITAN Z (GPU2) : No
> Peer access from GeForce GTX TITAN Z (GPU1) -> TITAN X (Pascal) (GPU0) : No
> Peer access from GeForce GTX TITAN Z (GPU1) -> GeForce GTX TITAN Z (GPU2) : Yes
> Peer access from GeForce GTX TITAN Z (GPU2) -> TITAN X (Pascal) (GPU0) : No
> Peer access from GeForce GTX TITAN Z (GPU2) -> GeForce GTX TITAN Z (GPU1) : Yes
```
All combinations pass:
* `0,1`
* `0,2`
* `1,2`
* `0,1,2`
Closes https://github.com/caffe2/caffe2/pull/659

Differential Revision: D5148779

Pulled By: akyrola

fbshipit-source-id: 6263edfe8b36623983f1946b5c3f4a3fef415a45
2017-05-30 12:02:19 -07:00
a47652379f Fix SparseAdagrad for indices.ndim>1
Summary:
Same fix as https://github.com/caffe2/caffe2/pull/249, but for SparseAdagrad.

Also update the tests for both ops to test this functionality.
Closes https://github.com/caffe2/caffe2/pull/675

Differential Revision: D5148750

Pulled By: akyrola

fbshipit-source-id: d30b722429bc547fd53400c1a29e4ee9e2e6ed18
2017-05-30 12:02:18 -07:00
df2bd158db Optional force conv algorithms
Summary:
Allow user to force cuDNN convolution algorithms from python - useful if you're using a standard network and don't want to pay the cost of exhaustive search.

Defined as an array in the order of [fwd, wgrad, dgrad].

Also refactors cudnn_conv_op slightly to split the code to do wgrad and dgrad a little more.
Closes https://github.com/caffe2/caffe2/pull/570

Reviewed By: akyrola

Differential Revision: D5125731

Pulled By: asaadaldien

fbshipit-source-id: cc5c64d3ccd2546f8e744d818f587bbbd24f055b
2017-05-30 10:46:41 -07:00
16b240145a Fixing some tests
Summary:
As dzhulgakov said at https://github.com/caffe2/caffe2/pull/227#issuecomment-295084443, it would be nice to avoid this stream of CPU-only test fixes.

The second fix could have been avoided if tests were run on TravisCI. I think the TravisCI infra could be greatly improved if we used ccache like your colleagues at PyTorch: https://github.com/pytorch/pytorch/pull/614. Would you be interested in a PR which does this?
Closes https://github.com/caffe2/caffe2/pull/547

Differential Revision: D5147405

Pulled By: akyrola

fbshipit-source-id: 5e9a4571d364c5f0ed8a5e216c9b6136dd4d10be
2017-05-30 09:16:48 -07:00
dc517b6c42 Change hypothesis settings for slow memonger test
Summary:
Failure mode:
```
  - 7 passing examples, 0 failing examples, 0 invalid examples
  - Typical runtimes: 12-14987 ms
  - Stopped because settings.timeout=60
```
After this change:
```
  - 5 passing examples, 0 failing examples, 0 invalid examples
  - Typical runtimes: 12-15475 ms
  - Stopped because settings.max_examples=5
```
Obviously, the `DYNAMIC_PROGRAMMING` tests are the troublemakers. An alternate solution would be to make separate tests for the two assignment algorithms (one fast, one slow).
Closes https://github.com/caffe2/caffe2/pull/676

Differential Revision: D5147363

Pulled By: akyrola

fbshipit-source-id: 85d9f8198e53c10de2a8d6645e2b0eb7953c96e0
2017-05-30 09:16:48 -07:00
2c3071fc4e Rework initializers to pass a class not object
Summary:
Changed tests
Moved to WeightInitializer, BiasInitializer keywords
Closes https://github.com/caffe2/caffe2/pull/682

Reviewed By: Yangqing

Differential Revision: D5138769

Pulled By: salexspb

fbshipit-source-id: 81d266100b2a95c64c0196c16670dfd34ea03e02
2017-05-30 09:06:56 -07:00
4eb448a051 Fix simple typo
Dimension a bit wrong
2017-05-28 18:53:04 +02:00
660dd58022 fix for realtime training.
Reviewed By: kennyhorror

Differential Revision: D5068298

fbshipit-source-id: 0dc3580c9c8123368a3625fb654c6eaf1dc4a950
2017-05-26 23:49:40 -07:00
6aff754dbc Add batch normalization layer
Summary: As desc.

Reviewed By: xianjiec

Differential Revision: D5077230

fbshipit-source-id: f73cdedac6d9a3542f8ef829b54fb4c713dcafd0
2017-05-26 16:46:52 -07:00
ec19b4bd7b Import fixes for Python 3
Summary: As title

Differential Revision: D5135990

fbshipit-source-id: 88cb15bb2fb97dd21faf3ea5ddb8d4dbff7fad93
2017-05-26 16:31:50 -07:00
3ccbf23132 String-related fixes for Python 3
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.

Reviewed By: salexspb

Differential Revision: D4893083

fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
2017-05-26 16:04:32 -07:00
7f98dc28cb Refactored spatial softmax
Summary: Refactored SoftmaxWithLoss by removing the code for spatial=1 mode and created a new op SpatialSoftmaxWithLoss that has the spatial mode implemented.

Reviewed By: viswanathgs

Differential Revision: D5104120

fbshipit-source-id: 8ab999e32c916b2a39a670a7b2a3365401535f24
2017-05-26 14:50:43 -07:00
78c1415012 Use unwind functions instead of backtrace to attempt to be more portable
Summary:
This should build on all linux systems now (unwind.h appears to be a gcc extension that clang supports as well) on every platform - even android. I'm not sure how to look at what platforms support which libc extensions, so I'm unsure how to proactively ensure this PR will work on all platforms.
Closes https://github.com/caffe2/caffe2/pull/656

Reviewed By: pietern

Differential Revision: D5134097

Pulled By: danzimm

fbshipit-source-id: 093a49239c6d9d43ca64c52e8aaab569970b2cf9
2017-05-26 13:46:35 -07:00
b266c52b51 Create signal failure blobs in constructor, avoid race condition
Summary: andrewwdye caught a sigsegv that happened at Gloo failure signaling function. Turns out workspace->CreateBlob() is not thread safe, and since we are running multiple threads it is likely that many gloo ops fail at once and thus we get a race. Caffe2 ops should actually be created in constructor, so that's what this diff does.

Reviewed By: andrewwdye

Differential Revision: D5139269

fbshipit-source-id: 7eaab3084e4e39543632c628c5e0710225e73b65
2017-05-26 13:01:43 -07:00
065c59860a Fix docs: masked_fill_ takes a value, not a tensor. (#1663) 2017-05-26 14:41:03 -04:00
75a6f909c5 Add option to enable memonger for gradients and add param_names for save_model.
Reviewed By: akyrola

Differential Revision: D5131493

fbshipit-source-id: 7c159ccffa30eb064c157e559f1d8f0350f03ccb
2017-05-26 11:31:35 -07:00
45f665d05c Fix decodeUInt64BE
Fixes #1658
2017-05-26 11:21:31 -07:00
35eaf444c0 Quickly hack sparsenn_benchmarks to also do BenchmarkNet
Summary:
Makes benchmark a bit hacky, but it's a benchmark after all :)

Specifically ports functionality of proper BenchmarkNet run from the ads_benchmarks so that we can see training net perf.

Also adds --report_interval parameter to print stats more often when running in hogwild mode

kdub0 - hopefully if you have time you can integrate it properly with the Flow's workflow

harouwu -shouldn't conflict too much with your current diff

Reviewed By: rayleichen

Differential Revision: D5125183

fbshipit-source-id: 9c6f1663bc85e26d6609f0f2f23aa280731939db
2017-05-26 10:48:45 -07:00
d60a2e3c58 UnsortedSegmentSum/Mean for CUDA
Summary:
To make optimizer for sparse gradients work with CUDA, we need UnsortedSegmentSum and Mean implemented for CUDA. Unique was already implemented by harouwu.

Pretty straightforward implementations, should be fast enough -- and i don't know a faster way anyway.

Added some tests as well.

Reviewed By: asaadaldien

Differential Revision: D5124548

fbshipit-source-id: 63ae72f45fc2f07470603f7b2de12f34635dbb3d
2017-05-26 09:33:49 -07:00
97159810c9 Restore compatibility with protobuf2
Summary:
Addresses an issue with 417f74509e.
```
>               operators.append(proto.op.pop())
E               AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'pop'
```
/cc jhcross
Closes https://github.com/caffe2/caffe2/pull/658

Reviewed By: dzhulgakov

Differential Revision: D5130382

Pulled By: salexspb

fbshipit-source-id: 34e0c39aad5f339c1aaa1506af3e7495193565f4
2017-05-26 08:47:24 -07:00
016f72537a ModelHelper.create_param, Initializer abstraction and ParameterInfo for optimizers
Summary:
This is going to unblock Nvidia in their work on adding fp16
support to Caffe2. I discussed this with kennyhorror before to make
sure this fits into his work on parameter sharing.

Reviewed By: kennyhorror

Differential Revision: D5127797

fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f
2017-05-25 22:03:15 -07:00
6c12df3003 Fix export of SparseToDense layer.
Summary:
If there're 2 SparseToDense layers that are densifying same IdList feature
it'll result in the situation, where we might export invalid input for the
prediction in input specs. This diff is changing the behavior to support to use
Alias to a new blob instead of passing things directly.

Reviewed By: dzhulgakov

Differential Revision: D5093754

fbshipit-source-id: ef4fa4ac3722331d6e72716bd0c6363b3a629cf7
2017-05-25 21:46:28 -07:00
9bf1f16255 Add bias to cosine distance for two tower models
Summary: Currently using two tower models with cosine distance results in bad calibration. Adding bias to the output of cosine term solves the problem.

Reviewed By: xianjiec

Differential Revision: D5132606

fbshipit-source-id: eb4fa75acf908db89954eeee67627b4a00572f61
2017-05-25 19:50:20 -07:00
2002018603 memory_leak_data_worker
Summary: Memory leak happens when new BlobReference is constantly added to the set _scratch_blobs

Reviewed By: panshen1

Differential Revision: D5134945

fbshipit-source-id: 3ce4d482153bb89de065f20cd91411178085caad
2017-05-25 19:22:03 -07:00
64faf120ac Adding support for ADD_TORCH_LIBRARY macro 2017-05-25 15:41:52 -07:00
0b74f0d796 lua 5.3 changes and gcc constants 2017-05-25 15:41:52 -07:00
c6591fa59b Add asan no sig tests, move fatal sig tests there
Summary: Changed test file name to signify that if testing with ASAN you should disable ASAN signal handling.

Reviewed By: pietern

Differential Revision: D5122977

fbshipit-source-id: f73de44df943516f3353cf408697869c43c45032
2017-05-25 15:02:36 -07:00
8074180081 Faulty error message for InstanceNorm1d (#1609) 2017-05-25 17:13:01 -04:00
5ce4a4adbf Merge commit '3f1f3f97343d2ab7eb522cac7330f6b7478bd4da' 2017-05-25 16:51:57 -04:00
3e9caed731 Merge commit 'bd705d38ce11a0ca1547f709f29f80a02b3dd894' 2017-05-25 16:51:09 -04:00
7b578dd68e Add scatterAdd 2017-05-25 16:49:48 -04:00
3f1f3f9734 Add scatterAdd 2017-05-25 16:49:32 -04:00
bd705d38ce Add scatterAdd 2017-05-25 16:49:22 -04:00
3ff54ffa8f Fix KeepOnShrink tests
Summary:
Fix https://github.com/caffe2/caffe2/issues/417
Closes https://github.com/caffe2/caffe2/pull/551

Reviewed By: sunwael

Differential Revision: D5130832

Pulled By: bwasti

fbshipit-source-id: 8620befdc0bca8630b346be3c928e657ce653d75
2017-05-25 13:48:07 -07:00
a9b5efe3c2 Expose max collective concurrency
Summary:
This was hardcoded at 4 before but should be made
configurable. Can be kept low for big MLPs and higher for convnets.

Reviewed By: akyrola

Differential Revision: D5126138

fbshipit-source-id: 713ee8bbeb243b7de1479808fd6398d397e0b49a
2017-05-25 13:32:40 -07:00
630af4d7d8 add learning rate schedulers (#1370) 2017-05-25 16:21:43 -04:00
cf078840d4 Update gloo dependency
Summary:
Updated dependency was expected in 6bc3f6ce1b761a3f8fe20bc90ecc0494a001f31e.
Closes https://github.com/caffe2/caffe2/pull/672

Differential Revision: D5129520

Pulled By: pietern

fbshipit-source-id: 0b4cb3c8950a693d56f3bb2fb04ab4aca868be07
2017-05-25 13:18:24 -07:00
a5f44ed265 Fix number of indices and block_size in SparseAdam
Summary:
Fix number of indices and block_size in SparseAdam to support gradients of any dimension.
Closes https://github.com/caffe2/caffe2/pull/249

Reviewed By: asaadaldien

Differential Revision: D5125714

Pulled By: akyrola

fbshipit-source-id: 84134049cb9a77e58562272ea351222befe27fca
2017-05-25 13:18:23 -07:00
0409b42a02 Merge commit '3abe5c80d2073f0e72f79b88f11b2a9d320fb116' 2017-05-25 15:40:27 -04:00
c39d48ea7d Fast transposed copy 2017-05-25 15:39:21 -04:00
3abe5c80d2 Fast transposed copy 2017-05-25 15:39:07 -04:00
05bc877a05 make THPPointer have explicit constructors (#1636) 2017-05-25 15:35:54 -04:00
7ea9d9af4e Fix build when included by another project; take 2
Summary:
Only adding `include_directories` doesn't propagate to the including
targets. Also use `target_include_directories` to do so.
Closes https://github.com/facebookincubator/gloo/pull/39

Differential Revision: D5131001

Pulled By: pietern

fbshipit-source-id: 6c58c4b76ae7fa008e4fb26d1bca7900165884d0
2017-05-25 11:50:23 -07:00
e35a4fe5cc Implement SizeOp as requested in github issue#583
Summary:
Implement SizeOp that returns the number of elements in the input
tensor.

Output is 1D tensor that contains the number of elements

Reviewed By: akyrola

Differential Revision: D5101061

fbshipit-source-id: d1c56053b6f3b41c65ac574dd748482775d1ea0d
2017-05-25 11:07:35 -07:00
d9896c43a7 improve cudnn conv type error msg
Summary: CuDNN conv op's type error was not very descriptive.

Reviewed By: Yangqing

Differential Revision: D5124638

fbshipit-source-id: 7d3f0afad36573cdb97d1f8ec3c60a9c6d87f926
2017-05-25 09:50:19 -07:00
6a7c56499c How to manage multiple build trees of PyTorch. (#1654)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-25 11:21:52 -04:00
46ee1e4687 Clarify definition of gather function in docs. (#1652) 2017-05-25 11:06:28 -04:00
e63b49d9ab Fix build when included by another project
Summary:
The CMake variable CMAKE_BINARY_DIR points to the top level build
directory. For standalone Gloo builds this path lets files include the
generated file "gloo/config.h". When Gloo is included as project, this
variable points to a different path and "gloo/config.h" cannot be
resolved. Fix is to build a path from CMAKE_CURRENT_BINARY_DIR.
Closes https://github.com/facebookincubator/gloo/pull/38

Differential Revision: D5129385

Pulled By: pietern

fbshipit-source-id: 722cebf4892b34f869fe43320153efbb181555b6
2017-05-25 07:50:53 -07:00
55d293f730 remove non-existing blobs from output_schema in layer_model_instantiator
Summary: In some cases (for example, when include_tags option is used) output_schema contains blobs that aren't produced by the generated net. In this case we want to filter them from output_schema as well.

Differential Revision: D5120115

fbshipit-source-id: f98ea3f747589390b039d1e1987becec3980634c
2017-05-25 00:36:19 -07:00
da6b82b810 fix another bug related to in-place ops --> treat in-place ops like any other
Summary:
D5116828 changed how in-place ops were hanled in memonger and fixed a crash in NeuralMT. However, it still produced incorrect memongerization, because an op with one inplace input-output but another non-inplace output would be handled still incorrectly, as the other output's branch would not be followed properly.

This is fixed by actually removing the whole in-place op special handling. This actually is not needed anymore, it was leftover from an older version of memonger that used topological sort of the ops.

Reviewed By: asaadaldien

Differential Revision: D5128142

fbshipit-source-id: b551b0faebdde410e6bd7516958c63cf610cc065
2017-05-24 23:32:03 -07:00
33c40e8a6e Handling shared indices in sparse gradient updates
Summary: When two or more blobs are gathered by the same indices blob in a data parallel model, we used to concatenate multiple times and re-write to the same indices blob. This leads to illegal memory access at times because the gradientslice indices blob is longer than its corresponding gradientslice values blob. This diff adds a check in order to avoid this.

Reviewed By: akyrola

Differential Revision: D5116817

fbshipit-source-id: 1c086d092eb6d48926d600f9408f578f5ddc41c7
2017-05-24 22:47:00 -07:00
036c3f93af Check for released variables in SavedVariable::unpack() (#1648)
Fixes #1288
2017-05-25 00:35:19 -04:00
4f261f5730 Add support for fast float16 reductions using AVX
Summary: Using Misha's vectorized AVX code to greatly improve performance of reductions on float16 values. Float16 reductions are now 2x faster than float.

Reviewed By: pietern

Differential Revision: D5123331

fbshipit-source-id: 03d4e76886d538b7e24eedaf32a92231a80b1e43
2017-05-24 21:20:06 -07:00
f2303ccb77 fix tileop test
Summary: Gradient test for tile op was flaky because i had made the dimensions too large. This caused push blocking errors. Also I noticed my test_grad_tile was incorrect.

Reviewed By: asaadaldien

Differential Revision: D5126476

fbshipit-source-id: ae9ce5d9041648d7a4535fc88d4013e669bd6f02
2017-05-24 18:32:01 -07:00
98581b9f7e Fix conv1d segfault when weight doesn't require grad (#1646)
Fixes #1600
2017-05-24 20:46:32 -04:00
9a497f824b Add size/dimensionality documentation for torch.gather. (#1645) 2017-05-24 20:42:18 -04:00
457720459d Change AllreduceOp and BroadcastOp to allow initializing gloo algorithms to take float16 inputs
Summary: Modify BroadcastOp and AllreduceOp to allow initializing algorithms on buffers of float16 values. Previously the Allreduce algorithm definitions were hardcoded to take float.

Reviewed By: pietern

Differential Revision: D5042015

fbshipit-source-id: c5c3ea5566f9f23969847dcc0735f5f4b075f56f
2017-05-24 16:48:38 -07:00
1e63a04a18 Use clear-to-send notification for broadcast algorithms
Summary:
The broadcast algorithms use the buffers they were given directly.
There is no inbox/outbox pattern. This means that we can race if the
algorithm is run repeatedly within a short time frame. This hasn't
been an issue so far since we've only used it in combination with
other process wide barriers.

Since this adds a round trip the latency of these ops from the root
rank perspective increases. The variance between the before and after
runs is pretty high since there is no back and forth interaction on
the root. It simply waits for recipients to be ready and then sends
its data.

Before:

```
Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   broadcast_one_to_all
Options:     processes=4, inputs=1

   elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
        100          1         16         29         50     426075
        200          2         17         32         50     179953
        500          2         11         31         59     140291
       1000          2         12         29         59     177619
       2000          3         12         29         62     117882
       5000          5         16         31         64     127113
      10000          9         21         38         88      60328
      20000         19         36         65        130      30427
      50000         48         68        221        556      11180
     100000         92        136        426        871       7314
     200000        193        251        829       2965       4092
     500000        492        638       2098       4133       1677
    1000000       1195       2024       3513      11646        628
    2000000       3446       4216       5007      17100        282
    5000000      12956      13919      14941      37751         71

```

After:

```
Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   broadcast_one_to_all
Options:     processes=4, inputs=1

   elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
        100         15         37         52        107      27332
        200         14         40         63        199      28620
        500         17         37         52        118      18299
       1000          9         39         57        120      33375
       2000         20         57         78        180      24779
       5000         31         61         84        190      18039
      10000         39         70         90        225       8908
      20000         57        108        130        940       8313
      50000         94        163        217       1933       5326
     100000        132        231        331       3501       3681
     200000        256        426        560       6509       2272
     500000        774       1092       1698      10039        985
    1000000       1132       2106       3878      18218        484
    2000000       3509       4252       6832      20228        226
    5000000      11326      15447      27129      52694         77
```

Reviewed By: wesolwsk

Differential Revision: D5123341

fbshipit-source-id: f3bab4f75ef7c38817f74f00b382f18fe43d85d5
2017-05-24 15:36:36 -07:00
a94fc625f5 make random generator more flexible in context.h
Summary:
using typedef to replace the type of Pseudo-random number engines, it would make it flexible to use
Closes https://github.com/caffe2/caffe2/pull/615

Differential Revision: D5121539

Pulled By: Yangqing

fbshipit-source-id: 988e57f8d119cb6f3bfe692fdb303aba2ecacbeb
2017-05-24 15:33:12 -07:00
e54112758c Fix potential vector out of range issue in ContextFactory::makeContext
Summary: Vector out-of-range error was being triggered in some tests due to trying to get the address of an element past the end of vector.

Reviewed By: pietern

Differential Revision: D5123044

fbshipit-source-id: 004f72ebaa27c609290959c12a3d99b16289bfa8
2017-05-24 14:50:09 -07:00
567842e68d Check system dependencies first
Summary:
This PR changes the cmake of Caffe2 to look for system dependencies before resorting to the submodules in `third-party`. Only googletest should logically be in third-party, the other libraries should ideally be installed as system dependencies by the user. This PR adds system dependency checks for Gloo, CUB, pybind11, Eigen and benchmark, as these were missing from the cmake files.

In addition it removes the execution of `git submodule update --init` in cmake. This seems like bad behavior to me, it should be up to the user to download submodules and manage the git repository.
Closes https://github.com/caffe2/caffe2/pull/382

Differential Revision: D5124123

Pulled By: Yangqing

fbshipit-source-id: cc34dda58ffec447874a89d01058721c02a52476
2017-05-24 14:31:51 -07:00
e1d257bc6d Fix segfault in autograd: (#1644)
* Fix segfault in autograd:

1) Every "output" variable must have a grad_fn or grad_accumulator
2) compute_partial_exec_callbacks uses Python errors

* assertRaisesRegexp was renamed assertRaisesRegex in 3.2

* Use HANDLE_TH_ERRORS macro
2017-05-24 17:13:08 -04:00
2aaac493d4 Fix cudnn version error formatting
Summary: Closes https://github.com/caffe2/caffe2/pull/619

Differential Revision: D5121534

Pulled By: Yangqing

fbshipit-source-id: afab91710da8f038b188a956bac275a4e9f18360
2017-05-24 13:47:26 -07:00
e03e14a71e Clean up binary build cmake script
Summary:
After the change we will be able to simply define targets and find dependencies.
Closes https://github.com/caffe2/caffe2/pull/640

Differential Revision: D5121700

Pulled By: Yangqing

fbshipit-source-id: 2d21e1afbccb09614054feccdd1bef55cbe3b035
2017-05-24 13:47:26 -07:00
3d38e4f126 Acquire GIL before THPVariable_wrap (#1625)
* Acquire GIL before THPVariable_wrap.

* mutex not required when GIL is held.

* Remove unused mutex.
2017-05-24 15:19:34 -04:00
4da076d3e9 Fixed typo caffe_translator.py, fixes bug #397
Summary:
Fixed minor typo in python/caffe_translator.py. Fixes #397.
Closes https://github.com/caffe2/caffe2/pull/412

Differential Revision: D4950875

Pulled By: aaronmarkham

fbshipit-source-id: 07183c6d6e8e97451bb5ee5ff01a88553d6bdb82
2017-05-24 12:18:32 -07:00
c79ce5c2ba Profiles pipe stages.
Summary: Adds timers to collect the average runtime for each pipe stage.

Reviewed By: azzolini

Differential Revision: D5083958

fbshipit-source-id: 42536bd70c80c2453d98d872286525388f6164c3
2017-05-24 12:02:03 -07:00
fa93653d09 Improve handling of graph roots in autograd engine (#1635) 2017-05-24 14:50:07 -04:00
152d439400 Allow specifying net type in predictor_exporter
Summary:
predictor_exporter copies the original predict_net's op, external_input and
external_output fields, but ignores the type field. This is reasonable as the
train net would generally have 'dag' type and copying that for inference may
not be applicable. It's good to have a way to specify the net type nevertheless
to run DAGNet for inference. This diff adds a field in predictor_exporter to do
that.

Reviewed By: akyrola

Differential Revision: D5122354

fbshipit-source-id: 0e3cc417128db903c71515135c9e3b87620ae21e
2017-05-24 11:46:27 -07:00
03503140fd DropoutCell as wrapper for another RNNCell
Summary: Added a new RNNCell, DropoutCell, which wraps an existing RNNCell and applies dropout to its primary output (as defined by get_output_state_index()).

Reviewed By: salexspb

Differential Revision: D5084871

fbshipit-source-id: 60474af84e5757a12e7fdc3814840dc9ba8e32a1
2017-05-24 11:36:45 -07:00
c55be38e63 Added mobile exporter
Summary: Basically takes in a live net and creates an init_net and predict_net which can be written to file and run in Predictor

Reviewed By: salexspb

Differential Revision: D4989425

fbshipit-source-id: 8052065da9ed763d48bd9e1e19f7697ef60a2829
2017-05-24 11:36:44 -07:00
db1d62caf7 Move RunPlan to a separate file
Summary: This RunPlan is getting complex and confusing. The first step to clean it up is to move it out of workspace.cc to better mark separation of concerns.

Reviewed By: kennyhorror

Differential Revision: D5100721

fbshipit-source-id: 4be0559eba1abb8bb1ddc3818698763c2e014ef2
2017-05-24 11:07:15 -07:00
c39f6cf2d0 gradient accumulation fix
Summary: As noted by salexspb, MultiRNNCell had unreliable gradient computation. The problem was that recurrent gradient and gradient computed wihtin the backward step net were not being accumulated during the backward pass, but rather writing to the same blob, thus overwriting each other. This diff fixes that by artificially introducing an extra blob for the internal output, and then accumulating it into the gradient coming from the recurrent connection.

Reviewed By: salexspb

Differential Revision: D5110059

fbshipit-source-id: 16add50989fe8866361bbc21afce5f214c5292fd
2017-05-24 10:33:32 -07:00
3fe8abb492 fixed gflags 2.2.0 error and image_input_op.h
Summary:
- caffe2 compiles now with gflags 2.2.0 (compiled from source), see issue https://github.com/caffe2/caffe2/issues/491
- fixed an error in image_input_op.h (did not compile in vs2015)
Closes https://github.com/caffe2/caffe2/pull/559

Differential Revision: D5121555

Pulled By: Yangqing

fbshipit-source-id: 9d2bedadd13d1872bb930a95d67ed20263988d13
2017-05-24 10:09:17 -07:00
bf6f630888 bug fix in CMakeLists.txt (CAFFE2_CPU_FLAGS and CAFFE2_WHITELIST)
Summary:
Fixed a bug in CMakeLists.txt: should not use option cmd for setting the initial value(empty string) of CAFFE2_CPU_FLAGS and CAFFE2_WHITELIST, because option can only be used for boolean(ON/OFF) variables. Use set cmd instead. The bug can cause compilation errors if CAFFE_CPU_FLAGS is set to ON, since an invalid 'ON' flag will be added to CXX_FLAGS. (2) Add build_* in .gitignore to allow multiple build directories in repo
Closes https://github.com/caffe2/caffe2/pull/611

Differential Revision: D5121545

Pulled By: Yangqing

fbshipit-source-id: 1f57042075356b6bf7138f65565b327be2a6d272
2017-05-24 10:09:17 -07:00
b5a215db0a Added python-pip and python-numpy into build_raspbian.sh
Summary:
Added python-pip and python-numpy into build_raspbian.sh script
because they are not installed in ubuntu/debian minimal image.
Closes https://github.com/caffe2/caffe2/pull/609

Differential Revision: D5121550

Pulled By: Yangqing

fbshipit-source-id: 14dd1450275fcc2aa9d2a06f0982f460528a1930
2017-05-24 10:09:16 -07:00
43be6456e2 UNUSED_VARIABLE VS compile fail fix
Summary:
fix test projects UNUSED_VARIABLE compile fail on visual studio
Closes https://github.com/caffe2/caffe2/pull/613

Differential Revision: D5121541

Pulled By: Yangqing

fbshipit-source-id: c353e8df4995e732e4d5d64bac15d849464efea2
2017-05-24 10:09:15 -07:00
ff047fdeef Fix the mix-up of height and width on depth-wise convolution 2017-05-24 21:05:08 +08:00
6c511f64cc fix handling of ops with in-place input/output
Summary: Memonger ignores ops with input and output in-place, but did not work correctly if there were also non-inplace inputs, like with Mul. Simple fix to also look at in-placeness during the traversar.

Reviewed By: jhcross

Differential Revision: D5116828

fbshipit-source-id: 52817f1221597986cc09cc65d094417c1923d965
2017-05-23 18:23:33 -07:00
2486a6bbd0 Add missing header file types.h in CMakeLists.txt
Summary: A recently added header file was missing in CMakeLists.txt

Reviewed By: pietern

Differential Revision: D5116962

fbshipit-source-id: 6c3fbd4b49c913f20308c1b057a7e09806e0c2b0
2017-05-23 16:50:41 -07:00
640846b864 Fix race in ibverbs transport
Summary:
In a previous commit where the slot numbering was expanded, I changed
the memory region send/recv path to use a map for the outgoing memory
regions (since they may complete out of order). Before, this was a
fixed size array, which was mutated by both the user thread and device
thread without holding a lock. The map, however, can't be mutated
without a lock. This change adds that lock and a few assertions to
check for this type of problem.

Reviewed By: andrewwdye

Differential Revision: D5108194

fbshipit-source-id: 1908c988112469ecdec6cb6eb9849068d896c409
2017-05-23 15:38:48 -07:00
77b38b915e Checks performance regression for resnet50.
Summary: Guard operator execution times by leveraging ProfDagNet generated statistics.

Reviewed By: akyrola

Differential Revision: D5065462

fbshipit-source-id: b480a5083eb557a09eeb3fdbb5d54ff16ed923c7
2017-05-23 13:34:45 -07:00
64e04e78d2 Remove AddOperator from ModelHelper
Summary:
It looks like AddOperator was never really used (searched across the whole
code-base). In addition to this all model_helper functionality is getting
replaced with Brew, so there I'd prefer to remove this method to reduce the
amount of code touching model.params.

Reviewed By: rayleichen

Differential Revision: D5110425

fbshipit-source-id: f2a88e4c1ce5149d27e809e03da9a86c0867bc4d
2017-05-23 13:34:45 -07:00
ba56de1150 add coding UTF-8 declaration 2017-05-23 16:02:34 -04:00
6e3e453ad2 Tidy up convs docs (#1602) 2017-05-23 18:32:33 +02:00
2b11adb414 TileOp CUDA fix: number of threads must be hard coded
Summary:
I had "optimized" the number of threads / block, but cub::BlockReduce has a static template parameter for the number of threads, and this must match. Probably tests still passed because typically the initial numbers are zeros.

Also added a stronger test.

Thanks ves for the report.

Differential Revision: D5110901

fbshipit-source-id: c1169b1286e204c202b0727448ddb51b4965eacb
2017-05-23 09:32:19 -07:00
f5d919a685 Generate config.h file with compilation options
Summary:
This file can then be used by downstream code to figure out what Gloo
features it can support (e.g. ibverbs transport or not).
Closes https://github.com/facebookincubator/gloo/pull/36

Differential Revision: D5110769

Pulled By: pietern

fbshipit-source-id: 2c0c07537258048737ae764a4978f2f7fdbd992d
2017-05-23 09:26:03 -07:00
02e4ca9cab fix wrapper 2017-05-23 08:43:13 -07:00
70a774898e Remove superfluous forward declaration
Summary: ContextFactory is no longer mentioned in gloo/context.h.

Reviewed By: romain-intel

Differential Revision: D5110328

fbshipit-source-id: 48dd020dc39d71d0d5f72deebfa5d80122b70c0d
2017-05-23 08:20:55 -07:00
74e964ff0d make data_workers restartable
Summary: Add ability to restart data workers data input.

Reviewed By: andrewwdye

Differential Revision: D5108666

fbshipit-source-id: f7f71cd6d4d45d007067814a552fc93cbe3eca42
2017-05-23 01:18:44 -07:00
49befe3fcd Remove commPairs_ member variable from halving/doubling
Summary: TSIA

Reviewed By: wesolwsk

Differential Revision: D5110348

fbshipit-source-id: d3346e2af1a9f13410dc93336c53040a29e22e66
2017-05-22 21:21:42 -07:00
7eac2073b8 Add notification mechanism to ContextFactory
Summary:
This is another example where our unsolicited writes may interfere
across calls to the collective function. In this case, it was possible
for a second call to overwrite a pair's address before it had been
used to connect the pair in the previous iteration.

Thinking out loud, we could avoid this from happening by supporting
this pattern natively in the Buffer classes. For example, we can add a
notification mechanism (opt in) to the Buffer class such that the
receiver may call `ackRecv()` to acknowledge receipt and handling of
the data in the buffer. Then the sender will block on new sends until
acknowledgement from the previous send has been received. Until then,
we have to keep an extra eye out.

Reviewed By: wesolwsk, romain-intel

Differential Revision: D5095430

fbshipit-source-id: 4c100433108fccea7457bba4dc00f651f722e6c9
2017-05-22 19:50:18 -07:00
356c19319f Change repo from bwasti to caffe2.
Summary:
I'm assuming the repo should be caffe2/caffe2.git and not bwasti/caffe2.git. Changed it accordingly.
Closes https://github.com/caffe2/caffe2/pull/572

Differential Revision: D5105328

Pulled By: aaronmarkham

fbshipit-source-id: 4bd3babbd93c79831be79c6d40b81d873fcc3f4c
2017-05-22 15:32:23 -07:00
45524ec33c Fix indices bug in MM.py (#1613) (#1617) 2017-05-22 16:47:51 -04:00
1d8e93536c better TileOp/Gradient CUDA implementation
Summary: ves and jamesr66a had noticed that TileOp for CUDA was very slow, as it started kernels inside double loops. It was my fault not to notice this in the code review. This diff uses 1 kernel for forward and backward passes and is probably much faster. I did not test though, maybe ves or jamesr66a can help?

Reviewed By: jamesr66a

Differential Revision: D5101968

fbshipit-source-id: 64b6ac933785e3710b3c1d8c692a4c48650bca96
2017-05-22 12:17:13 -07:00
5a7f67bd41 Add stack traces on fatal signals
Summary:
When a fatal signal is fired to a task that links against caffe2 this PR adds stacktraces from every thread that's currently running. Only linux is supported currently. The signals that are currently supported are SIGABRT, SIGINT, SIGILL, SIGFPE, SIGBUS and SIGSEGV (more signals can easily be added, but for now this seemed like the major signals that might be fired - see signal_handler.cc:138 for the table of signals).

I've added tests that verify that each of those signals indeed output the expected number of stacktraces.

We need to add linking against libdl since on linux apparently it's not implicitly always linked in (I'm coming from macOS where I believe it is).

Example output can be found [here](https://gist.github.com/danzimm/814faa1229d9c54f359d23ba038344a6) - note that the signal name changes depending on the signal that was sent (as well as the number in parenthesis that corresponds to the specified signal).
Closes https://github.com/caffe2/caffe2/pull/596

Reviewed By: akyrola

Differential Revision: D5087526

Pulled By: pietern

fbshipit-source-id: ba8d058c9ca1cf06b41667205193f8699f8d6964
2017-05-22 10:34:32 -07:00
193c9289f0 Fix LRN schema for cuDNN op
Summary:
Correct schema generation was previously broken leading to invalid gradient op creation.

Also exhibited in model_device_helper, where invalid schema were being created on the CPU when kwargs['engine'] == 'CUDNN'
Closes https://github.com/caffe2/caffe2/pull/617

Reviewed By: asaadaldien

Differential Revision: D5097062

Pulled By: akyrola

fbshipit-source-id: e22181f857deccb7b4395e87271e2cbf1226eb64
2017-05-22 08:33:34 -07:00
f072c74dfd make it effective to transfer a tensor from other devices to device 0 (#1610) 2017-05-22 11:06:57 -04:00
107a0fe9ac Revert "Revert "ClassNLLCriterion supports missing targets"" 2017-05-21 13:48:19 -04:00
2acfb2376a fixes eval mode in InstanceNorm (#1604)
fixes https://github.com/pytorch/pytorch/issues/1541
2017-05-21 13:27:48 -04:00
0c5598c668 Update build status matrix 2017-05-21 12:20:50 +02:00
37834b1343 Change video_input_op to output label in int32 instead of float
Differential Revision: D5101606

fbshipit-source-id: cc3ab4309c521832f776f7770ba469cdf03f8485
2017-05-20 16:47:55 -07:00
92610e78bb CuDNN comparison mode
Summary:
This is allows to produce nice comparisons against
CuDNN. Currently on 1 layer I see about 28% slow down on
average across setups specified.

Reviewed By: akyrola

Differential Revision: D4986218

fbshipit-source-id: efb12081f13dbfb92428fd4a85f12fd566eb9522
2017-05-20 15:19:43 -07:00
feaee29bfe Add argmax and argmin to docs 2017-05-20 18:56:20 +02:00
a2c01e830b fix duplicate init blob issue + fix test
Summary:
Address KaimingHe's comments in D5093689 about same blob being initialized twice causing internal consistency check to fail. Also I noticed that my new test for test_checkpoint_params was completely botched due to an indentatino issue (it did not actually execute any test). So this fixes that as well.
 Modified the test to add a duplicate param initializer, so that this bug is tested for.

Reviewed By: KaimingHe

Differential Revision: D5101304

fbshipit-source-id: 72f343035c1b4953e7bb9a1a1c171cf05d3ead26
2017-05-20 09:18:29 -07:00
aa603a9083 add test for input order
Summary: Based on jay-mahadeokar's code, add a test for input order consistency to data workers.

Reviewed By: jay-mahadeokar

Differential Revision: D5096887

fbshipit-source-id: efd226343f81e9a0157ec89d4588f1eee8a78549
2017-05-19 23:46:38 -07:00
6384bae29b call save_to_db in CPUContext + fix a typo in data_parallel_model.
Summary:
If Predictor Exporter save_to_db is called in CUDAContext, a failure occurs since the following FeedBlob() tries to store a string (meta data), but for CUDA blobs we assume they are tensors.
  + fix a typo in data_parallel_model that I bumped on.

Reviewed By: asaadaldien

Differential Revision: D5099837

fbshipit-source-id: 69d01b35a9a1816bf083f13d8a6ce88e1f5aecb7
2017-05-19 18:25:00 -07:00
83f6dceaa6 remove forget_bias as argument to AttentionCell constructor
Summary: argument unsused.

Differential Revision: D5096088

fbshipit-source-id: fcda8a1d2b0d7c85182ab5bc002c86640b443f97
2017-05-19 16:53:40 -07:00
c69ab3d3ad Fix open source build with ffmpeg
Summary: Rename some type of AVPixelFormat

Reviewed By: aaronmarkham

Differential Revision: D5097337

fbshipit-source-id: 8ee9b0fc7284752e56f74c7ada241b3bd421efd1
2017-05-19 16:19:44 -07:00
09bbd0382c ConvNd cuDNN
Summary: Add ConvND cuDNN implementation.

Reviewed By: akyrola

Differential Revision: D4702205

fbshipit-source-id: 65275bcff3970b0d43ac5c168d38bcd075985979
2017-05-19 15:20:33 -07:00
b5721c2d9d Throw timeout exception from StoreHandler::wait() and catch in CreateCommonWorldOp
Summary: Define StoreHandlerTimeoutException() for timeouts in StoreHandler::wait(). Update all StoreHandler implementations. Catch new exception in CreateCommonWorldOp and store failure blob.

Reviewed By: akyrola

Differential Revision: D5095625

fbshipit-source-id: dc6f8351cc129cd1fac72bd4b2c8e6b684b21f31
2017-05-19 15:01:23 -07:00
0af0cba2b7 Refactor data_parallel_model initial sync and checkpointing
Summary:
Major improvements. Before we only synced "params" and "computed params" of model after initialization and after loading a checkpoint. But actually we want to sync all blobs that are generated in the param_init_net. For example the _momentum blobs were missed by the previous implementation and had to be manually included in checkpoint finalization.

I also added GetCheckpointParams() to data_parallel_model because it is now fully general. Also added a unit test.

Reviewed By: andrewwdye

Differential Revision: D5093689

fbshipit-source-id: 8154ded0c73cd6a0f54ee024dc5f2c6826ed7e42
2017-05-19 12:48:06 -07:00
0aeffa985e make sure mutex is on CPU too
Summary: mutex is only supported on CPU. need to make sure mutex and following atomicIter are both on CPU. This is critical for gpu SparseNN training

Differential Revision: D5093184

fbshipit-source-id: 021e6ba699a3208449fa4761cad6b0ec4544957e
2017-05-19 12:17:17 -07:00
65750349ba deprecate CNNModelHelper in python/operator_test dir
Summary:
deprecate CNNModelHelper in python/operator_test dir

BTW I found that there is 2 mkl_speed_test. I am confused...

Reviewed By: salexspb

Differential Revision: D5094122

fbshipit-source-id: f6526f4de334f2245eb4c1f204a8ec9f23750d78
2017-05-19 12:17:17 -07:00
7ce5d0765b GivenTensorIntFill on CUDA
Summary: GivenTensorIntFill on CUDA, usefull for GPU training

Reviewed By: dzhulgakov

Differential Revision: D5093208

fbshipit-source-id: 500338a127a6c4ecdacd732195c5c5cc776f3d4f
2017-05-19 10:19:01 -07:00
32bf7a2c2b Generalize PoolingOp(cuDNN) to compute 2D and 3D pooling.
Reviewed By: akyrola

Differential Revision: D5090689

fbshipit-source-id: f9f11e12adc0ee8db088f3397a8c33aa31eb5deb
2017-05-19 10:19:00 -07:00
7f6cd7c7ea Fix error message in CUDA forked subprocess (#1585)
We need to re-call _lazy_init in _CudaBase.__new__ in the subprocess.
2017-05-19 12:36:08 -04:00
1b7497807f cnnmodelhelper deprecate warning
Summary: We will start our API migration process. Before that, I want to make sure people don't add new CNNModelHelper instance to our opensource code. So that I put deprecation warning here in advance

Reviewed By: salexspb

Differential Revision: D5093556

fbshipit-source-id: 74bf4a7782c2d882f72f202d48c72255d152b68a
2017-05-18 23:35:26 -07:00
625850c2c2 Check cuDNN version at runtime (#1586)
* Check cuDNN version at runtime

This checks that the version from cudnn.h matches the version from
libcudnn.so.

Fixes #1476

* Only check major and minor version numbers
2017-05-19 01:55:09 -04:00
9b3447761a Check for required non-None arguments in C++ autograd functions (#1589) 2017-05-19 01:47:35 -04:00
ed679fc43c disabling fd leakchecker test (#1593) 2017-05-19 01:20:50 -04:00
e6c9509a41 Fix call to Tensor.set_ in rnn.py (#1592) 2017-05-18 20:28:49 -04:00
c57f0530e7 let long_args False for param "size" of set_ (#1568)
* fix #1524, let long_args False for param "size" of set_
2017-05-18 19:31:36 -04:00
8021bb938c Remove slot number limitation from ibverbs transport
Summary:
The pair was still hardcoding limits on the slot numbers. In this
change those limits are lifted.

This also adds back assertions on work completion status in
handleCompletion.

Reviewed By: wesolwsk

Differential Revision: D5090457

fbshipit-source-id: 7bf884e1f31e48e8f1cdfb179a225999e28171b2
2017-05-18 16:20:40 -07:00
1f4317be3f Add support for half-precision floating point operations
Summary: Add support for collectives over vectors of half-precision floating point values.

Reviewed By: pietern

Differential Revision: D5062938

fbshipit-source-id: 0b39fa53370393fec1edf2d852ff7f1d862b9022
2017-05-18 15:09:06 -07:00
77f539174c Update fp16 NCCL ops
Summary:
CUDA_HAS_FP16 -> CAFFE_HAS_CUDA_FP16
Closes https://github.com/caffe2/caffe2/pull/605

Differential Revision: D5090629

Pulled By: pietern

fbshipit-source-id: 3df12c0547f55bdd27be25f59e1e7823ebf8b899
2017-05-18 15:02:53 -07:00
cba46a4869 Assert that we don't do out of bound writes on recv
Summary:
The halving/doubling algorithm had two instances where a receive
buffer was registered with a number of elements instead of a number of
bytes. This change adds the assertion that should have caught this in
the first place.

Reviewed By: wesolwsk

Differential Revision: D5089483

fbshipit-source-id: fd0f0724ef04300236c9297ee88b27e61fb1e5a0
2017-05-18 14:34:39 -07:00
b391f53681 Cache send/recv buffers in ContextFactory
Summary:
The original implementation created temporary buffers on the backing
context. This also meant an ordering problem when using the ibverbs
transport, as a call to send will block until the remote side has
created its receive side buffer. Since all buffers are now created
prior to using them, this is no longer an issue.

Reviewed By: romain-intel

Differential Revision: D5082352

fbshipit-source-id: 4c260f06e8f461c0336e7eec7ca891e07ff41cd3
2017-05-18 10:20:42 -07:00
307459eb62 Fix conv_test for CUDNN dilated convolution in NHWC
Summary:
CUDNN dilated convolution was added to V6. This version of CUDNN does not support NHWC for dilated convolution.

Fix conv_test.py so that it does not test CUDNN for dilated convolution in NHWC format.
Closes https://github.com/caffe2/caffe2/pull/598

Reviewed By: akyrola

Differential Revision: D5084835

Pulled By: asaadaldien

fbshipit-source-id: 3c0c5ed02c5d9232fca567e387ab6260d71e5aaf
2017-05-18 10:07:28 -07:00
9386bc7ca8 Improve elementwise comparison docs
Summary: In response to https://github.com/caffe2/caffe2/issues/581 feedback, add textual "less than", "greater than" etc. to comparison operator docs, instead of just <, <=... which are hard to search on browser.

Reviewed By: asaadaldien

Differential Revision: D5085907

fbshipit-source-id: f129d94f03aff1cc919f8da843aa461f157eb144
2017-05-18 10:07:27 -07:00
b61378b4b6 vectorized version of lstm_unit
Summary: vectorized lstm_unit using eigen

Reviewed By: ajtulloch

Differential Revision: D5051296

fbshipit-source-id: 1fa39ce474c731772c4169150622943a7eaec8e3
2017-05-17 23:33:22 -07:00
85f1d947dd Vectorize SigmoidOp on CPU
Summary: I noticed that Sigmoid was taking an inordinate amount of time in our NMT benchmark, so I looked at the implementation and it didn't seem optimal. I replaced the implementation with an Eigen version so that when the Eigen update goes through, we will get proper AVX(2) vectorization.

Differential Revision: D5082464

fbshipit-source-id: aa951f7d730fc05198f7dd04076ec58d471b74c8
2017-05-17 20:33:36 -07:00
12edbcb154 Implemented L1Distance Operator for CUDA
Summary: Added L1Distance Operator for CUDA, as well as tests.

Reviewed By: bwasti

Differential Revision: D5071966

fbshipit-source-id: 4c3d862605e9123d955bf091efa67d0731bd816a
2017-05-17 17:32:53 -07:00
85732b52ec fix cuda multiple algorithm test
Summary: Fixing a bug in the multiple algorithm test where threads were spawned repeatedly, causing collisions during rendezvous.

Reviewed By: pietern

Differential Revision: D5082945

fbshipit-source-id: 4adbbc963b1ff652f73a44cd9fd75dcd3325f182
2017-05-17 16:35:25 -07:00
156fe28666 dataloader can now handle growing datasets (#1575) 2017-05-17 19:23:15 -04:00
2f4bf4ab39 Rewrite 'How autograd encodes the history' to accurately describe current setup. (#1580)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-17 19:21:20 -04:00
1f3ff5ced2 Miscellaneous documentation around autograd. (#1577)
* Miscellaneous documentation around autograd.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-17 19:19:24 -04:00
b8b7f879c2 .gitignore updated with editor temporaries (#1574) 2017-05-17 19:16:02 -04:00
1f5cd3582c Add contrib/gloo/common.cc to Caffe2_CPU_SRCS
Summary: Missing common.cc in contrib/gloo/CMakeLists.txt

Reviewed By: pietern

Differential Revision: D5082928

fbshipit-source-id: 4ab5142a168d2f66cc9624c3054eb6e936976c66
2017-05-17 16:10:29 -07:00
a0b83464e4 fix bad conversion to float in cpu_half2float
Summary: When converting from half to float, the bytes to be returned were represented as an unsigned int. When returning, this had the effect of converting the unsigned int into a float. This is incorrect, as we want to instead take the raw data and return it as float.

Reviewed By: pietern, asaadaldien

Differential Revision: D5080335

fbshipit-source-id: 7208efc5799daccf92e1628ee326f7470b867261
2017-05-17 15:57:42 -07:00
7b10b16496 Move ibverbs buffer send logic to pair.cc
Summary:
TSIA

This matches the approach in the TCP transport where all send/recv
logic is contained in the pair code.

Reviewed By: wesolwsk

Differential Revision: D5082503

fbshipit-source-id: b70886ed9aaeb381cdb45fba00704118cff62a23
2017-05-17 15:54:34 -07:00
da86633c7c Additional synchronization in halving/doubling
Summary:
This is necessary to avoid the next iteration of the algorithm
overwriting data in recvBuf_ before it has been consumed by the
receiver of that data. If this does happen, the result of the previous
iteration for the receiving end is corrupted. This can only happen in
async mode on the TCP transport (so all incoming data is unsolicited)
when spinning on the run function.

Reviewed By: wesolwsk

Differential Revision: D5074789

fbshipit-source-id: 66668fbd885888f26266d812e78d61c6d65c2461
2017-05-17 15:21:09 -07:00
bbd7aee9ab Revert D4952993: [Caffe2] fix mkl_sparse and migrate sparsity experiments
Summary: This reverts commit 86c03676ab4e47f04d2d0dd438a4a1c849bbbff0

Differential Revision: D4952993

fbshipit-source-id: 5c213c48ac44ce6aefccacc6d80534648d3c516a
2017-05-17 14:46:56 -07:00
c573d53939 Bug fixes (#1573)
* Fix clang warnings
* Raise errors when unsupported ConvNd configurations are used
* Properly handle Variable indexing with LongTensors
* Support both tensors and variables in Variable.type_as
2017-05-17 15:28:16 -04:00
f27c9eea20 dropout for C2 multilayer
Summary:
Incorporate arbitrary dropout for encoder and decoder layers for Caffe2 NMT models using current configuration. This involves separate output processing (_prepare_output() and _prepare_output_sequence()) for the final layer in a MultiRNNCell.

Switching to using the newly introduced forward_only switch for RNN cells revealed an unrelated bug in our NetGradientChecker test, which urikz is investigating.

Reviewed By: salexspb

Differential Revision: D5031964

fbshipit-source-id: 19b49607d551aa3e2140041ef4e585f128c8f178
2017-05-17 11:32:47 -07:00
f555c6308c Fix NormalizeOp gradient numerical stability
Differential Revision: D5075044

fbshipit-source-id: 8c20b9021020c9ada1f1059e15fafea9bd5674ff
2017-05-17 09:19:00 -07:00
658c337f41 Error status for Gloo ops, and handling in elastic dpm
Summary: Add a RandomFailureOp and handling to elastic data parallel model of the status code

Reviewed By: andrewwdye

Differential Revision: D5065936

fbshipit-source-id: 24224f9ea414ee535c9e90cc28add5189354b0ef
2017-05-17 00:16:52 -07:00
5ced84856a Caffe2: SparseToDenseMask: return key presence
Summary: Caffe2: SparseToDenseMask: return key presence

Reviewed By: matbd

Differential Revision: D5066863

fbshipit-source-id: 4f4dd141f6661829535cb77ff47cc0c230dce5d6
2017-05-16 20:22:03 -07:00
f359d70ae7 fix mkl_sparse and migrate sparsity experiments
Summary:
Migrate experiments folder to fb/sparse folder. Keep FunHashOp and SparseFunHashOp because they are now assumed as a default Op in depr. What I did

  # Migrate FunHashOp and SparseFunHashOp and their unitests to core-caffe2, make sure tests are passed.
  # Migrate other Ops in experiment folder to fb/sparse folder. Write new TARGETS files for them. Make sure tests are passed.
  # Make sure all related tests passed.
  # Fix MKL definition btw. Make sure that FC_Sparse is not compiled when there is no MKL support

Reviewed By: salexspb

Differential Revision: D4952993

fbshipit-source-id: 86c03676ab4e47f04d2d0dd438a4a1c849bbbff0
2017-05-16 18:33:51 -07:00
37c06a3ba8 residual connections in multilayer C2 ('add' only)
Summary:
Residual connections for multilayer RNN encoder/decoder for Caffe2 NMT model. Only supporting 'add' connections (the standard approach, which ves's TF experiments concluded was at least as good as other approaches), and also only implementing for residual_level >= 1 (which also fits our use case).

It is the responsibility of the config to ensure dimension compatibility: each level at and beyond residual_level (in both the encoder and decoder) should have the same number of units, with the exception that a bidirectional initial encoder layer should have half the number of units of the succeeding layer if that next layer is a residual layer.

Differential Revision: D5023160

fbshipit-source-id: f38c1b140638fee78cf3ef7d6b4602dd462484ee
2017-05-16 17:04:58 -07:00
a28b01c155 rnn with brew
Summary:
Update rnn_cell.py and char_rnn.py example with new `brew` model.

- Deprecated CNNModelHelper
- replace all helper functions with brew helper functions
- Use `model.net.<SingleOp>` format to create bare bone Operator for better clarity.

Reviewed By: salexspb

Differential Revision: D5062963

fbshipit-source-id: 254f7b9059a29621027d2b09e932f3f81db2e0ce
2017-05-16 13:33:44 -07:00
310f505da7 Remove application-specific comment.
Summary: This comment is not relevant for open-source.

Differential Revision: D5070835

fbshipit-source-id: 8e2dadae85566e7f6684d42f921daf7d345dc065
2017-05-16 12:17:03 -07:00
769e668faf ttsn model fails to set optimizer for FC layer
Summary:
the FC ModelLayer needs an optimizer, also seems the catch-all
that sets a default for missing optimizers had a bug

Reviewed By: xianjiec

Differential Revision: D5048302

fbshipit-source-id: cbbf641fb9ee4f4f89c5dbb132f7837ecdbe37a5
2017-05-16 11:26:02 -07:00
cb79c24d0b Added powerpc64le support (#1572) 2017-05-16 08:30:06 -06:00
64d43dbb6e new resnet building with brew
Summary: new resnet building with brew

Reviewed By: akyrola

Differential Revision: D4945418

fbshipit-source-id: d90463834cbba2c35d625053ba8812e192df0adf
2017-05-15 22:47:24 -07:00
af0a412e83 alternating workspace for forward only
Summary: Use alternating workspace for forward_only RNNs.

Reviewed By: jhcross

Differential Revision: D5064930

fbshipit-source-id: d1572b5f90b219fda9dfa31ce6140331672052f2
2017-05-15 21:47:06 -07:00
caa1cdf0ce ClassNLLCriterion ignoreIndex 2017-05-15 22:27:00 -04:00
25fd005dd9 Initial implementation of Blockwise Model Update Filtering (BMUF)
Summary:
A Single machine multi-GPU version of BMUF algorithm. BMUF is a modification to
model averaging where updates to global model is implemented as a filter:
param_t = param_(t-1) + delta
delta = \beta delta_(t-1) + \alpha average(param_t) - param_(t-1)

Reviewed By: akyrola

Differential Revision: D4995057

fbshipit-source-id: 48176ba66d67eaf3fa4dee16d50d9589825ddba4
2017-05-15 18:18:15 -07:00
57054bd52f use remapped name for param_grads, to enable memonger
Summary: We need to use remapped name for param_grads to enable memonger.

Differential Revision: D5064198

fbshipit-source-id: ae54407c3362044e9bc2bff929e12da68cd6a332
2017-05-15 17:26:40 -07:00
368ecb47f9 Fix flaxy test_sparse_adagrad (#1562) 2017-05-16 01:03:08 +02:00
e394b60a9c Support un-equal weight training for mtml models
Reviewed By: queqichao

Differential Revision: D5047939

fbshipit-source-id: 857d0d77e0413939e5774fa37d21b92a00d34bf0
2017-05-15 12:56:11 -07:00
ad37840329 fixed document generator for github
Summary: Fixed generator. Tweaked the output to fit github markdown template.

Reviewed By: bwasti

Differential Revision: D4569692

fbshipit-source-id: 87f497319cc8b258c6c75dc0837d728c5eda5636
2017-05-15 11:40:46 -07:00
6107d15d14 Twice differentiability of pointwise functions (#1531) 2017-05-15 12:00:59 -06:00
ba885a1a51 expose bitwise operators from C/CUDA (#1556)
* fix issue #1549, expose bitwise and

* expose C bitwise or of Tensor

* expose C bitwise xor of Tensor

* use built-in method for inplace and, or, xor

* expose C bitwise lshift(ilshift) and rshift(irshift) of Tensor
2017-05-15 11:36:15 -06:00
ce1a0eb6c9 Merge commit '7afd78d77ffad503357c35f495ae6d4d2b008862' 2017-05-15 11:20:27 -06:00
7afd78d77f Cuda reduce in a consistent direction 2017-05-15 11:18:20 -06:00
6b84dc26f0 Add F.cosine_similarity (#1502) 2017-05-15 11:12:54 -06:00
0f458ee3c4 Fix memory leak in THCSTensor_spcadd. (#1519)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-15 11:11:03 -06:00
8aa011f52a minor typo and style changes to _torch_docs.py (#1559) 2017-05-15 15:32:56 +02:00
2a610c9d13 Revert "Update to ignore zero targets" 2017-05-14 18:15:30 -07:00
ac8b2c0fa3 Revert "ClassNLLCriterion supports missing targets" 2017-05-14 18:14:36 -07:00
0ba20435ce Add high order grad support for Some operator (#1507) 2017-05-14 23:02:04 +02:00
6fc9130052 Adapt documentation to reflect new supported argument (#1548)
Reflect the changes of #1323
2017-05-13 21:09:34 -06:00
28f4f6db2c typo error for torch.addr (#1547)
fix the typo error in the example for torch.addr
2017-05-13 08:53:05 -07:00
9b2de027be SpatialDepthWiseConvolution.cu added 2017-05-12 16:02:14 -04:00
bf4345e2ef ClassNLLCriterion supports missing targets 2017-05-12 15:15:39 -04:00
3eeca5b5e0 arg scope in ModelHelper
Summary: based on our discussion, we want an arg_map in ModelHelper and create arg_scope for that model within brew. Now it is realized

Reviewed By: salexspb

Differential Revision: D5042983

fbshipit-source-id: ddd2c7e9bca1be2f08a32f7252b44d3b60a57996
2017-05-12 11:18:59 -07:00
029290c5b1 SpatialDepthWiseConvolution 2017-05-12 11:34:27 -04:00
78abf0134d Merge pull request #458 from jnhwkim/master
Update to ignore zero targets
2017-05-12 10:38:18 -04:00
9db7787316 Updating __getitem__ and __len__ for containers (#1544) 2017-05-12 16:17:06 +02:00
5989deb707 adding 3d operator translators
Summary: Adding caffe-to-caffe2 translators for Conv3D, Pooling3D, BatchNorm

Differential Revision: D4945495

fbshipit-source-id: fe3c97547507924a1409b977307b928ce78445f3
2017-05-11 23:01:44 -07:00
b070197e8a cuda unique op
Summary:
cuda unique op , unittest provided, will provide benchmark agains CPU

SpeedUp results for synthetic real data. Input of size 20k, range[1, 10million], **~5x** speedup

  CPU 9.05795(ms) Unique
  GPU 1.79434(ms) Unique

SpeedUp results for 5x synthetic data. Input of  size 1 million, range[1, 10million] **~13.7x** speedup

  CPU 54.7539(ms) Unique
  GPU 3.99473(ms) Unique

Reviewed By: akyrola

Differential Revision: D5007726

fbshipit-source-id: 0a00c518fd1809d0ae8c6cfcba09b0bd982ffaff
2017-05-11 21:08:10 -07:00
942f53b5a6 gradient impact of task layers on shared is configurable
Reviewed By: chocjy

Differential Revision: D4943948

fbshipit-source-id: 2e26dfb30c6893b60985f693a823646ed3d3e0e3
2017-05-11 20:34:04 -07:00
16de9746bb Fix a bug in 3D SpatialBatchNorm[CPU] gradient and improve the code.
Summary: As in title

Reviewed By: dutran

Differential Revision: D5047102

fbshipit-source-id: 01032270115343ab7eaccb97df11729446f1c463
2017-05-11 19:31:18 -07:00
efa913b1c2 fix uninitialized variable in cmake FindSSE (#1023) 2017-05-11 18:57:34 -07:00
93f1d0ca7c L1 Operator
Summary: Adds the L1 Distance operator to distance_op.

Reviewed By: bwasti

Differential Revision: D5007719

fbshipit-source-id: fd547c6645cf5f87305e9ebfd95ed918779c1d2a
2017-05-11 18:03:10 -07:00
d1a4467682 fix a bug when calling modules
a module that returns a non-standard data structure currently breaks
due to checks for backwards hooks. This refactors the code slightly so
this will only break in the event of backwards hooks.
2017-05-11 23:00:45 +02:00
0a25b9cb50 fix android build
Summary:
The most recent diff from Andrey had a tiny bug that triggered an error in Android.
Closes https://github.com/caffe2/caffe2/pull/543

Differential Revision: D5040516

Pulled By: Yangqing

fbshipit-source-id: d7b11b509a20b8b5e33db74dd383b55f43608c8f
2017-05-11 11:22:25 -07:00
507ddc4cde Temporary fix for multiple backwards with fused pointwise RNN (#1540) 2017-05-11 11:18:56 -07:00
aba05ce9db Ensuring float tensors call float versions of math functions 2017-05-11 10:39:35 -07:00
8df51a84ac Support 3D&1D SpatialBatchNorm[CPU]
Summary:
Generalize SpatialBatchNorm CPU Op to compute Spatial batch normalization for
1D, 2D & 3D input tensors.

Reviewed By: dutran

Differential Revision: D5043563

fbshipit-source-id: 7fcb933a628dd47f13aa622f63601a87382f09cd
2017-05-11 09:32:54 -07:00
be843eb26b Add unfold to autograd (#1523) 2017-05-11 17:53:16 +02:00
a23b378052 set cuda stream for cub::DeviceReduce in SumReduceLike
Summary: After a long and painful debugging of indeterministic behavior on Machine Translation team's attention model, I found that in certain cases SumReduceLike will use cub::DeviceReduce, and it lacked the stream param.

Reviewed By: jamesr66a, asaadaldien

Differential Revision: D5043347

fbshipit-source-id: bb91aacfc6786cc2b85ebc4e432c67e5f876e235
2017-05-10 23:32:44 -07:00
e16ea46013 Extended ImageInputOp
Summary:
Added several features to the ImageInputOp:
  - bounding box (per image as well as default for the operator). For per-image, it
    only works in Caffe2 format and is passed as the third tensor in the form
    (ymin, xmin, height, width). For the operator, pass bounding_xmin, bounding_ymin,
    bounding_width and bounding_height as parameters.
  - per-channel mean/std. You can use the usual mean/std to pass a single
    value to be used for all channels or also pass mean_per_channel and std_per_channel
    to specify different values per channel. Order of channels is BGR.
  - A minimum size parameter that can be specified instead of the scale parameter.
    The minsize parameter will only scale the image if it is smaller than required.
    This differs from scale which will scale up as well as down. You can only specify
    one of scale or minsize.

Added a test case to test some of the features

Differential Revision: D4874988

fbshipit-source-id: 437191052a46e9916defe8b100d7cc7864373f61
2017-05-10 17:52:01 -07:00
e8c274cf16 Optimize memory usage for MI-LSTM
Summary:
Use ElementwiseLinearOps instead of manual Mul + Sum. That saves intermediate blobs.

For NMT use case

Before: https://our.intern.facebook.com/intern/fblearner/details/18060753
Time per step: 0.072
memory usage (per each of 2 GPUs): 9041MiB

After:https://our.intern.facebook.com/intern/fblearner/details/18107583
Time per step: 0.0715
Memory (per each GPU): 8560MiB

Reviewed By: akyrola

Differential Revision: D5038785

fbshipit-source-id: 4bc8155dbd0c87729e17236d68d62ca530aadb53
2017-05-10 16:53:43 -07:00
967a0ebef4 Revert D5027046: [Caffe2/RNN/opsify] apply links ops
Summary: This reverts commit e6dd59ee843fe1507fc87377b0e1e23218dbc384

Differential Revision: D5027046

fbshipit-source-id: 99ac75dffdc35e9b089ccaaf26f8807db0903d43
2017-05-10 15:02:14 -07:00
4fa6ee8219 clean up code for selecting allreduce algorithm
Summary: Clean up code for initializing allreduce algorithm.

Reviewed By: pietern

Differential Revision: D5033172

fbshipit-source-id: 84b9c2b5b3204766a0211aaaa71ea31b09e55013
2017-05-10 12:46:52 -07:00
362cc296ad apply links ops
Summary: We need to also add links in ops, so that they don't require a sharp timestep boundary. This implements that.

Reviewed By: salexspb

Differential Revision: D5027046

fbshipit-source-id: e6dd59ee843fe1507fc87377b0e1e23218dbc384
2017-05-10 10:46:28 -07:00
11bcdbc3f0 Load Parameters from Model
Summary:
In Dper utility, add a function `load_parameters_from_model_init_options` to
allow init parameters from pretrained models

Reviewed By: xianjiec

Differential Revision: D4926075

fbshipit-source-id: 5ab563140b5b072c9ed076bbba1aca43e71c6ac5
2017-05-10 10:33:04 -07:00
20ae447ce4 Instead of switching workspaces, create expliticly shared blobs
Summary: As part of opsifying the RNN execution, we cannot do the workspace switching anymore as it happens at timestep boundary. But we can get same effect by just creating explicitly the blosb into the shared workspace.

Reviewed By: salexspb

Differential Revision: D5025667

fbshipit-source-id: 921c97cb2f7941f9f9235913a60e34667badc303
2017-05-10 09:38:03 -07:00
c70405271b opsify parameter gradient accumulation
Summary: Instead of explicitly accumualting the gradients in a loop, add corresponding Sum-ops to the net. This will allow for better parallelism with multithreaded nets.

Reviewed By: salexspb

Differential Revision: D5011177

fbshipit-source-id: 14e2fa2a6905703322d5701c1362054c17c4e796
2017-05-10 07:53:07 -07:00
5bb13485b8 Fix Linear function 2017-05-10 16:43:14 +02:00
a86adf43a1 Fix comparison functions 2017-05-10 16:43:14 +02:00
1c304a9ef6 Expose variable attribute of AccumulateGrad 2017-05-10 16:43:14 +02:00
feef54ec34 Don't modify non-volatile grads in zero_grad 2017-05-10 16:43:14 +02:00
5026209d0c Minor fix in Prod backward 2017-05-10 16:43:14 +02:00
e7220380bc Add new flags to Variable.backward 2017-05-10 16:43:14 +02:00
9fa0e403d6 Replace retain_variables with retain_graph 2017-05-10 16:43:14 +02:00
35cf380ed1 Improve output wrapping logic in autograd 2017-05-10 16:43:14 +02:00
3a7e068439 Remove spurious memo argument in Module.parameters() (#1527) 2017-05-10 13:55:15 +02:00
a66f02b223 Make dataset ops handle empty tensor better
Summary:
`Append` & `UnPackRecords` don't handle empty tensor well. `Append` would erase the shape of empty tensor, which break the invariants of dataset.

`UnPackRecords` leaves output tensor in an undefined state. If the output tensors were initialized, they would not be cleared out. If the output tensors were not initialized, they would remain uninitialized. This diff disable unpacking empty record if prototype tensors are not provided (since output shapes maybe indeterminable if they were not initialized). The interface remains the same if empty record tensor is not used.

Reviewed By: azzolini

Differential Revision: D4956012

fbshipit-source-id: ad80527d78eb7421cd90968edb82322c289cd417
2017-05-09 19:48:36 -07:00
3abd0cb623 Add axis argument to SoftmaxWithLoss
Summary: ##axis## argument for SoftmaxWithLoss (it doesn't yet work for spatial case).

Reviewed By: akyrola

Differential Revision: D5025797

fbshipit-source-id: 9e3cf39223af3f2c8bb357f8d9fe952b7349f913
2017-05-09 19:36:00 -07:00
75bc9f5e77 Relax requirement on token uniqueness
Summary: Relax requirement on token uniqueness since a few use cases broke after the uniqueness requirement was added in a previous diff.

Reviewed By: kittipatv

Differential Revision: D5034132

fbshipit-source-id: 327eb065923e6ea152a360324316f81b7fb9564b
2017-05-09 19:36:00 -07:00
48de1ea165 Drop extra Reshape in attention calculation
Summary: We can avoid this extra Reshape.

Reviewed By: jamesr66a

Differential Revision: D5032874

fbshipit-source-id: 92bd568bc6bec53d7f81a64cfa96d2c610823f8c
2017-05-09 17:16:36 -07:00
862105ec8b Merge commit 'd5e821044aa20d67122f4570a3f1cb7e6e9c2617' 2017-05-09 17:06:25 -07:00
d5e821044a Make torch.cat not synchronize the host and device 2017-05-09 17:05:23 -07:00
8e3ce4bae7 RNN: reduce verbosity of "Use
Summary:
this is still printed on tests a lot. Lets use 1 instead of
0 as most of our RNN code does

Reviewed By: jamesr66a

Differential Revision: D5031460

fbshipit-source-id: bc07990b66c89dfbd97133493cca11929d3138e5
2017-05-09 17:03:42 -07:00
bfc8a3ebba Reference counting documentation. (#1520)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-09 17:02:28 -07:00
6fab62173e Restore examples with keepdim=True default. 2017-05-09 14:49:55 -07:00
c4742fd128 Explicitly pass keepdim=False for tests that require it.
If we change the default to False, reverting this commit is optional.
2017-05-09 14:49:44 -07:00
e124790cb2 Change keepdim default to False. 2017-05-09 14:49:21 -07:00
171638a451 Fix test_normalize NN test. 2017-05-09 14:25:06 -07:00
d95f711501 Add a keepdim test to torch_test. 2017-05-09 14:25:01 -07:00
b9e00dfbb8 Make (non-legacy) nn backwards compatible.
The keepdim change only seems to leak in one place:
when the grad_bias is returned in linear.py.
2017-05-09 14:24:53 -07:00
f6a00fac13 Add autograd tests for keepdim 2017-05-09 14:24:45 -07:00
be5191a00b Add documentation for keepdim. 2017-05-09 14:16:42 -07:00
c9d8e0a43a Change all legacy/nn modules to use keepdim=True (even if tests don't fail).
We shouldn't be introducing changes in legacy modules if we can avoid it.
2017-05-09 14:16:31 -07:00
ae2b2cbbec Make keepdim work with autograd. 2017-05-09 14:15:59 -07:00
ae924be3ac Removing extra Reshapes in MILSTM with new broadcasted ops
Summary: D4873222 introduced SumReduceLike and removed the use_grad_hack ... hack. Remove unnecessary reshapes and kill use_grad_hack parameters.

Reviewed By: jamesr66a

Differential Revision: D4894243

fbshipit-source-id: c4f3f84abf95572d436b58bbdc2b18b21583c2f1
2017-05-09 14:11:04 -07:00
f4cf1d6d18 Merge commit 'af790f86f329364dacef1301fc9b5b292629075c' 2017-05-09 14:04:08 -07:00
c34cff7035 Merge commit '906c550e1079e9762194db59440a202ffca90dca' 2017-05-09 14:03:28 -07:00
194d7408bb Merge commit '5f308b50fb558a620253443ef45f7cf3a91be410' 2017-05-09 14:02:25 -07:00
0d538246fb Merge commit '98dbdc464b0f53ecc89af58cc994c7e8d7617e4e' 2017-05-09 14:01:13 -07:00
7c3cb24485 Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 14:01:03 -07:00
add840510f Refactor Optimizer to Allow scale_learning_rate
Summary:
In transfer learning, parameter initialized from pretrained model might require
a different learning rate than otherwise initialized. To this end, here we
implement a python solution where `base_learning_rate` is scaled by `scale`,
which is in turn set by `scale_learning_rate`; Alternatively, we can achieve
same effect by rewriting the LearningRate operator in C++

Reviewed By: kennyhorror

Differential Revision: D4992827

fbshipit-source-id: 8d7e87a61c95b3eb8ef733ec436f4060e865c0ac
2017-05-09 13:16:21 -07:00
20d8de8d51 Parameter cost estimation job
Summary:
Adds a parameter cost estimation step before the actual training starts. The costs are later used in order to better shard the parameters across instances of the parameter server.

Things I needed to modify:
- A few changes to make ModelLayerHelper picklable
- Add support for stopping a distributed job after a number of stats reporting steps.
- Refactored run_dist_job to support collocating the reader with the trainer even when PS are present.
- Option to disable dense updates (when num_dense_servers=0).

Currently there's a huge overhead posed by having to launch a child workflow. I'll try and address next in a subsequent diff.

This is WIP because the other workflows need to be migrated as well.

I can break this down into smaller diffs if reviewers would prefer it.

Reviewed By: kennyhorror

Differential Revision: D4974752

fbshipit-source-id: 04c336acb2945f8f11324a221ffc6967818c0672
2017-05-09 13:02:24 -07:00
af790f86f3 Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 11:55:42 -07:00
906c550e10 Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 11:55:29 -07:00
5f308b50fb Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 11:55:20 -07:00
98dbdc464b Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 11:54:58 -07:00
bd8ed6641c Stabilize PythonOp token name
Summary: For distributed jobs, we were relying on the order the PythonOps were registered, which was very fragile.

Reviewed By: dzhulgakov

Differential Revision: D5016847

fbshipit-source-id: f5601467c5b0569d5e8a0efdd76abad0d703c5f5
2017-05-09 11:19:44 -07:00
d48795e699 Use non-local include syntax
Summary:
TSIA

Fixes lint.

Differential Revision: D5024776

fbshipit-source-id: 6ee865c7d9892475d9d349c0ed0b4a57803dcf50
2017-05-09 11:02:41 -07:00
e44bc88c2e Remove command "touch cmake".
Summary:
It is quite normal to rerun cmake to find new files through GLOB commands. This external command always forces cmake to run when you run make, meaning every make command takes longer than necessary. This PR removes the external command `touch CMakeLists.txt` and therefore leaves that decision up to the user when to rerun cmake and speeds up building when cmake is not required to rerun.
Closes https://github.com/caffe2/caffe2/pull/453

Reviewed By: Yangqing

Differential Revision: D4978919

Pulled By: bwasti

fbshipit-source-id: 0da4495b276a04f6ce46e1c8ceca0474b7573aa0
2017-05-08 18:06:48 -07:00
65cf2f0117 compile error when build on mac enviroment
Summary:
build on mac, when build error occurred

[ 71%] Building CXX object caffe2/CMakeFiles/Caffe2_CPU.dir/operators/transpose_op.cc.o
In file included from /Users/pg/DeepLearning/caffe2/caffe2/operators/tile_op.cc:1:
/Users/pg/DeepLearning/caffe2/caffe2/operators/tile_op.h:25:28: error: implicit instantiation of undefined
      template 'std::__1::array<int, 2>'
    std::array<int32_t, 2> temp_params = {{tiles_, axis_}};
                           ^
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/__tuple:114:65: note: template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TYPE_VIS_ONLY array;
                                                                ^
In file included from /Users/pg/DeepLearning/caffe2/caffe2/operators/tile_op.cc:1:
/Users/pg/DeepLearning/caffe2/caffe2/operators/tile_op.h:119:28: error: implicit instantiation of undefined
      template 'std::__1::array<int, 2>'
    std::array<int32_t, 2> temp_params = {{tiles_, axis_}};
                           ^
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/__tuple:114:65: note: template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TYPE_VIS_ONLY array;
Closes https://github.com/caffe2/caffe2/pull/519

Reviewed By: asaadaldien

Differential Revision: D5020422

Pulled By: bwasti

fbshipit-source-id: 2a5896d0a7aa643dfe3ff688db958434b6e87780
2017-05-08 17:46:28 -07:00
e70164316c Merge commit '91a118c116d15d280a99a39666d298be15c6d592' 2017-05-08 16:58:56 -07:00
33b3968660 add larger tests for qr 2017-05-08 16:58:54 -07:00
91a118c116 Fix bug in magma qr decomposition and add tests for larger matrices 2017-05-08 16:44:15 -07:00
218ea722fb Don't use sync mode by default
Summary:
TSIA

For more information see:
* https://github.com/caffe2/caffe2/issues/360
* https://github.com/facebookincubator/gloo/issues/34#issuecomment-299330222

Reviewed By: andrewwdye

Differential Revision: D5011293

fbshipit-source-id: 2704151f84a46e658bd28dab3bcb9849c8423efc
2017-05-08 16:33:22 -07:00
1d0ba2cfbd New cudnn ops
Summary:
cuDNN versions of dropout and LRN (for native fp16 support), port of Caffe's max pooling algo that uses an explicit mask to store locations (also supports fp16 storage)
Closes https://github.com/caffe2/caffe2/pull/396

Reviewed By: akyrola

Differential Revision: D4990880

Pulled By: asaadaldien

fbshipit-source-id: a716acffb656843e9b31e3e6808bd2d8aa959d03
2017-05-08 16:33:21 -07:00
0764589ed1 Merge commit '008a8c9720183d7bf8b00bf64d8d21c62270089f' 2017-05-08 16:24:14 -07:00
27671c800d Merge commit '105df5844dca21f964d180a918c808489862941f' 2017-05-08 16:23:12 -07:00
d0504aa41d Implement lgamma function. 2017-05-08 16:21:26 -07:00
008a8c9720 Implement lgamma function. 2017-05-08 16:20:52 -07:00
105df5844d Implement lgamma function. 2017-05-08 16:20:39 -07:00
50bf7d5cbc Merge commit '066fbcd014fa4092152b2cd04ad1d92fc8d7bd59' 2017-05-08 16:13:57 -07:00
066fbcd014 use current stream in cat array kernel launch 2017-05-08 16:12:10 -07:00
ecf29f10ad Merge commit '22bbd7ac33ba51469cc913cb01fcd3b70a42e528' 2017-05-08 16:10:00 -07:00
22bbd7ac33 s/IndexType/long 2017-05-08 16:09:02 -07:00
2075abbe30 Gloo: Added a way to create connected contexts from another context
Summary:
Added a context factory that allows you to use an existing context to
create other fully connected contexts much more cheaply (without having
to rely on a store).

Limitations:
  - The backing context needs to be fully connected

Reviewed By: andrewwdye, pietern

Differential Revision: D4985121

fbshipit-source-id: 31ceabccbb679cedb18ec9927b6c166bef5989bb
2017-05-08 16:02:04 -07:00
11052d03aa RNNCell API change: returns states and outputs
Summary:
Incorporating definition of cell's output and illustraing it's usage by adding dropout to all types of cell.

I think that we should try to get rid of aliases in RecurrentNetwork, so output of applied_over_sequence is also always (state_1_all, state_2_all, ...). This way we can merge get_output_from_single_step, get_output_from_sequence and get_outputs_with_grads into a single method

Let me know what do you think!

Reviewed By: jhcross

Differential Revision: D4992913

fbshipit-source-id: 737939be336ad145f84e8733cd255d4f7188ef70
2017-05-08 15:19:48 -07:00
e694db0eeb Raise error when Variable is converted to bool. Fixes #1482. (#1491) 2017-05-08 23:14:11 +02:00
c5ae79fe4e Make clamp twice differentiable (#1514) 2017-05-08 23:12:42 +02:00
b6a8dd1438 don't recompute small blob in attention
Summary: decoder_hidden_encoder_outputs_sum_tmp is tiny after D5010109, no need to recompute it.

Reviewed By: akyrola

Differential Revision: D5014335

fbshipit-source-id: cc9e8f91372889d10bd99c79366018cb3943a435
2017-05-08 13:06:06 -07:00
0892a1428b Add size assertions to SparseAdam/Adagrad
Summary: Add boundary checks to sparse updates for easier debugging .

Differential Revision: D5018121

fbshipit-source-id: c0f18d75adf9adf66f8eb8022231e7e9d274838e
2017-05-08 11:41:22 -07:00
0cb7774445 softplus op
Summary: Added softplus function, f(x) = ln(exp(x) + 1)

Reviewed By: akyrola

Differential Revision: D5011057

fbshipit-source-id: 5fddb1568fee625f81ea3a86a85d0f400c3ee278
2017-05-08 10:40:25 -07:00
4ad2e155bc Make nn.Sequential more pythonic (#1510)
A minor fix which uses `enumerate` during iteration.
2017-05-08 07:32:07 -07:00
12965a4108 Add Poorman's IOBound ThreadPool for serialization.
Summary:
At the moment serialization can tak up to 3x memory of the largest blob:
original blob, BlobProto, SerializeAsString version of the blob. As a result in
certain cases serialization takes more memory than it should and it hurts
utilization/max model size per machines.

This diff is adding IOBound ThreadPool that should set quite strict limitation
on the extra memory overhead per one blob.

Reviewed By: dzhulgakov

Differential Revision: D5012887

fbshipit-source-id: 12dbb9d3efab136411ddeffd519b602cf606661e
2017-05-08 06:43:31 -07:00
8a7f00d61b fix mean pooling
Summary:
Segment based Ops requires increasing seg id, and without gap. Lengths based Ops does not
have this requirements.

Otherpooling methods, e.g., LogExpMean does not have Lengths based Ops available yet.

Differential Revision: D5019165

fbshipit-source-id: ab01a220e10d4ed9fa2162939579d346607f905e
2017-05-08 01:09:07 -07:00
ac1c63dda8 Add specialized ResizeNearest implementation for scale=2
Summary:
Specialized implementation of ResizeNearest for width_scale=2 and height_scale=2. This implementation doesn't use divides or calls to std::min, and is unrolled 2x over the width dimension. Also add a correctness test.

About 6x faster.

Reviewed By: ajtulloch

Differential Revision: D4928579

fbshipit-source-id: 5cc92a52bd688690fee907b4333d9c84b666f9c9
2017-05-07 21:10:11 -07:00
6d693fe413 Add F.normalize (#1467) 2017-05-07 13:54:16 +02:00
23b556ef77 Expose custom attributes from C++ functions (#1430) 2017-05-07 13:49:55 +02:00
e3f41a4962 Add high order gradient support for Sigmoid (#1496) 2017-05-07 13:00:20 +02:00
98cf176baa improve style + a bit of perf for ScatterWeightedSum CUDA
Summary: For perf, it is better to check weight0 inside the kernel and avoid host synchronization when copying to a stack variable. Improved style a bit (github does not have Lint, so contributed code may not conform to our style).

Differential Revision: D5011668

fbshipit-source-id: 1eb85912f6f499acd3190cfcb59e7e39c2220d89
2017-05-07 01:08:42 -07:00
90e9f8a476 Avoid segfault when calling join_with with self as arg (#1493) 2017-05-07 00:35:11 +02:00
5f15a9e0cb Add a note about THPFunction_asFunction 2017-05-06 14:28:32 -07:00
8f692b5642 declare UpdateTimestepBlob inline
Summary: Since this function is declared in a header file, and is not templated and not part of a class, it will produce an ODR error if it is included in more than one file. Adding the `inline` keyword fixes this.

Reviewed By: jhcross, jamesr66a, m3rlin45

Differential Revision: D5011770

fbshipit-source-id: 50266a530da31ebfda59fcca2048355a00fe7758
2017-05-05 21:50:58 -07:00
711ea1d4ac fix enternalinputs handling in AppendNet v2
Summary: External inputs must be computed before updating the _ops_output structure, otherwise if the net to be appended outputs the external input, it is not added correctly

Differential Revision: D5013496

fbshipit-source-id: 6a83d0a6f1c63ef8ae7bec4d862c0ac2a690d47b
2017-05-05 21:50:57 -07:00
033ab9da1b Adding video data layer for caffe2
Summary: Adding a simple video data layer which allows to read video data from frames, videos and output 5D tensor. It also allows multiple labels. The current implementation is based on ffmpeg

Differential Revision: D4801798

fbshipit-source-id: 46448e9c65fb055c2d71855447383a33ade0e444
2017-05-05 14:16:38 -07:00
a61778a628 fix recompute_blobs_on_backward
Summary: My previous refactoring broke recompute_blobs_on_backward, which was cleared.

Reviewed By: urikz

Differential Revision: D5013351

fbshipit-source-id: 5945778c0cff2ee2c7f5ad7b59b58f4305fa6a05
2017-05-05 14:01:34 -07:00
f2392bb8cb Fix Split documentation
Summary: Split doc failed to mention important features like specifying 'split' argument. Two questions the same day in Caffe2 Users were about how to do this.

Reviewed By: azzolini

Differential Revision: D5009503

fbshipit-source-id: 883549be891705a5c83778302d967481419f4dde
2017-05-05 13:46:39 -07:00
5c667ebe4e AttentionCell
Summary:
This diff creates a generalized AttentionCell class, which will allow us to construct attention decoders out of arbitrary RNNCell components (with a particular view to using stacked, multi-layer RNNs).

In order to do this, we introduce a new optional input for RNNCell._apply which allows us to provide an additional input that is not processed by prepare_input(). Note that this is an argument only to _apply, not apply, since it is only meant to be used for additional recurrent connections to "embedded" cells, not for standalone RNNs.

Reviewed By: urikz

Differential Revision: D4998465

fbshipit-source-id: 473009ea4917e86e365f9d23aa2f11a46a94fd65
2017-05-05 12:33:01 -07:00
d7f20c94fd Optimize memory for RNN attention
Summary:
The fix should save us (source_len - 1) * target_len * batch_size * encoder_output_size * 4 bytes for the forward pass. Typically, these values are 100 * 100 * 128 * 512 * 4 = 2.4GB.
Not entirely sure about backward pass.

Reviewed By: akyrola

Differential Revision: D5010109

fbshipit-source-id: 2ed68f3ebfd3b8362916d24af991482f1686e064
2017-05-05 12:18:50 -07:00
0c6099ce25 Add __dir__ so autocomplete in iPython works.
Summary: It is good practice to provide __dir__ whenever __getattr__ is defined so that tooling will work intelligently.  In particular, it is hard to explore the available methods in iPython without tab completion.

Reviewed By: dzhulgakov

Differential Revision: D5006545

fbshipit-source-id: 1a150d91d54637d80b292764513943ff70d971b4
2017-05-05 11:32:06 -07:00
8a2433eacb Add model saving and loading to resnet50_trainer.py
Summary:
Script caffe2/caffe2/python/examples/resnet50_trainer.py can be used to train a ResNet-50 model with Imagenet data (or similar).

However, currently the script does not actually save the model, so it is kind of useless.

Task 1:  After each Epoch, save the model in a file "<filename>_X.mdl' where X is the epoch number and <filename> is given as a command line parameter. By default, use "resnet50_model" as filename.

Task 2: Add a functionality to restore the model from a previous file:
 - add a command line parameter "load_model", which user can use to specify a filename.
 - if this parameter is set, load the model parameters from the previous file

Reviewed By: prigoyal

Differential Revision: D4984340

fbshipit-source-id: 333e92679ba52a7effe9917fdfc2d55d652b868f
2017-05-05 10:08:37 -07:00
5c52392229 opsify AccumulateInputGradients
Summary:
Part of project to make all gradient accumulation business ops in RecurrentNetworkGradientOp, this makes the accumulateInputGradients ops.

Also added way to mark operators private so they don't appear in docs.

Reviewed By: salexspb

Differential Revision: D5006698

fbshipit-source-id: 226d7afb473290c8d0f936d2cc87640be3e06615
2017-05-05 09:13:39 -07:00
aa5e771042 Added tiles and axis as input parameters to Tile Operator
Summary:
Added the possibility to add 'tiles' and 'axis' as input
as opposed to arguments for the Tile Operator. If provided, the input
values will override the argument values. Now with proper CUDA code

Differential Revision: D4930347

fbshipit-source-id: b44b032b327c7d7bddfce63abf4e3289d7e74bfb
2017-05-04 23:46:51 -07:00
0d32ab4a45 Refactor FTRL optimizer to allow sending Alpha as input blob
Summary: Split from parent diff

Reviewed By: xianjiec

Differential Revision: D4992993

fbshipit-source-id: 9f8a79023b0c581e84bd5e82e2e730c9e1a86e1e
2017-05-04 22:57:00 -07:00
211eae127c LastNWindowCollector
Summary: Layer for LastNWindowCollector op. We need this since it's an in-place operator.

Reviewed By: chocjy

Differential Revision: D4981772

fbshipit-source-id: ec85dbf247d0944db422ad396771fa9308650883
2017-05-04 17:32:09 -07:00
b229b7ff11 Fix typo 'convlution'
Summary: Closes https://github.com/caffe2/caffe2/pull/470

Differential Revision: D5003850

Pulled By: pietern

fbshipit-source-id: 62ba13f58dae9f19a434f2075ff3ac143d34feb5
2017-05-04 17:02:35 -07:00
d312dcc881 lstm_benchmark use rnn_cell.LSTM multicell + assertion
Summary:
Use the rnn_cell's multi-cell for LSTM benchmark. While doing this, i had not changed the initial_states and I got a inconsistent result from rnn_cell, so added an assertion to check initial states length is 2 * num layers.

+ fix division by zero error

Reviewed By: salexspb

Differential Revision: D5003177

fbshipit-source-id: a8250b825394c352428a0f067098dfcd7516ab2a
2017-05-04 17:02:32 -07:00
c34d5a838f Generalize LastNWindowCollector
Summary: Use `CopyItems` so that it accepts any type of tensor. Also, move the cursor to input blob so that it's checkpoint friendly. Output is now also part of input so that inference can work correctly.

Reviewed By: xianjiec

Differential Revision: D4920987

fbshipit-source-id: da532736225ec27f409ff763ff69a0629235151c
2017-05-04 16:05:15 -07:00
4662781099 Include hint about run ID in store handler assertion
Summary:
TSIA

Also see https://github.com/caffe2/caffe2/issues/476

Differential Revision: D5002728

fbshipit-source-id: 2c301cacc395cfed4eec11dffedc3dba0e180e72
2017-05-04 15:21:12 -07:00
348e0af6e1 Remove unused binary fb_run_plan_mpi
Summary:
TSIA

This caused a compilation problem on gcc-6, see
https://github.com/caffe2/caffe2/issues/456.

Differential Revision: D5002823

fbshipit-source-id: 764aae1eaf78ee9918455b95a12e982597b85fdc
2017-05-04 15:21:11 -07:00
ff0ff33a11 Fix docs for InstanceNorm (#1477) 2017-05-04 18:11:15 -04:00
eb2c6ea874 set deviceId_ to -1 when CudaDevicePointer and CudaStream do not have valid data
Summary: Set deviceId_ to -1 when CudaDevicePointer and CudaStream do not have valid data

Reviewed By: andrewwdye

Differential Revision: D4881374

fbshipit-source-id: e973a70e2e6e4519f5fdc2ad4e76f232d9593751
2017-05-04 15:05:27 -07:00
e64b2e1cd7 add documentation for cwrap plugins (#1474) 2017-05-04 17:50:58 -04:00
dbe7654062 Always use halving/doubling allreduce
Summary:
Gloo added support for non-power-of-2 number of nodes in the recursive
halving/doubling allreduce algorithm by implementing the binary blocks
extension. This means we no longer have to fall back to using the ring
algorithm when the number of nodes is not a power of 2.

Reviewed By: prigoyal

Differential Revision: D4992536

fbshipit-source-id: f231aecbb46296ae3441ab818e058eb7ad6d8d64
2017-05-04 14:48:26 -07:00
004c740b6d Update gloo dependency
Summary: Closes https://github.com/caffe2/caffe2/pull/485

Differential Revision: D4998648

Pulled By: pietern

fbshipit-source-id: efe862fdf6195f9dfd5d0b98fb12e2e2f48bb894
2017-05-04 14:18:30 -07:00
395a80f757 Check GCC version if compiling with CUDA support
Summary:
Otherwise compilation fails pretty far into the build, which is inconvenient.

The error reported when trying to compile with GCC 6:

    CUDA 8.0 is not compatible with GCC version >= 6. Use the following
    options to configure GCC version 5:

      -DCMAKE_CXX_COMPILER=/usr/bin/g++-5
      -DCMAKE_C_COMPILER=/usr/bin/gcc-5
      -DCUDA_HOST_COMPILER:FILEPATH=/usr/bin/gcc-5
Closes https://github.com/caffe2/caffe2/pull/504

Reviewed By: akyrola

Differential Revision: D5004299

Pulled By: pietern

fbshipit-source-id: 185cd2f846f291a48e1d41ce0d87ca69e7f2c593
2017-05-04 12:19:17 -07:00
c8f444237f net_drawer: --input is required
Summary:
Before:
```
$ python -m caffe2.python.net_drawer
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/data/caffe2/install/caffe2/python/net_drawer.py", line 403, in <module>
    main()
  File "/data/caffe2/install/caffe2/python/net_drawer.py", line 365, in main
    with open(args.input, 'r') as fid:
TypeError: coercing to Unicode: need string or buffer, NoneType found
```
After:
```
$ python -m caffe2.python.net_drawer
usage: net_drawer.py [-h] --input INPUT [--output_prefix OUTPUT_PREFIX]
                     [--minimal] [--minimal_dependency] [--append_output]
                     [--rankdir RANKDIR]
net_drawer.py: error: argument --input is required
```
Closes https://github.com/caffe2/caffe2/pull/479

Differential Revision: D5003898

Pulled By: pietern

fbshipit-source-id: d121c331411ba4bbded81f9658ec787fa2fd3dc1
2017-05-04 11:45:57 -07:00
f220282ddd Set optimal number of DAG workers for predictor beam search step-net
Summary: Allow RecurrentNetwork to accept dag as a step-net

Differential Revision: D4985747

fbshipit-source-id: ff39e0386c8f3a7364801a3011558f322d8ea669
2017-05-04 10:16:03 -07:00
7d40140bfb Document squeeze behavior on 1-dimensional tensors of size 1. (#1470) 2017-05-04 16:54:22 +02:00
e50c7daaf9 Use Qr factorization to get orthogonal matrix in orthogonal init (#1453) 2017-05-04 07:11:59 -04:00
600f366a13 Merge commit 'a6876a4783ce3d1bb3c6ba69f54c31983097ed17' 2017-05-04 06:51:10 -04:00
a6876a4783 fix corner-case in MaxPooling 2017-05-04 06:50:15 -04:00
4e18d89791 added twice differentiation for a bunch of ops (#1426) 2017-05-04 06:47:14 -04:00
de9845588d Merge commit 'c061ed5bda238e1276601593343c10428d01eaae' 2017-05-03 23:14:26 -04:00
c061ed5bda handle beta=0 for gemv with transpose 2017-05-03 23:05:41 -04:00
e9d648c5e7 Fix memory leak introduced by 72e8190 (#1464) 2017-05-03 18:38:56 -04:00
57e51bd72a make all tensor.h enforces pass the caller
Summary: When I added the CAFFE_ENFORCE_WITH_CALLER typedef to tag the tensor-pointer into enforce-exceptions, I only changed the most common callsites. This changes all enforces in tensor.h.

Reviewed By: salexspb

Differential Revision: D4995773

fbshipit-source-id: 90f2d277aeeb1354e72f92b2b9a75601fcbea609
2017-05-03 15:31:29 -07:00
47c1418816 Add caffe2 operators to mobile: Log, StumpFunc, Div, Sub
Summary: Add the above operators to fbobjc and fbandroid by splitting them out to separate files and including these on the build. We are using these on mobile as part of Scout (Messenger).

Reviewed By: bwasti

Differential Revision: D4958660

fbshipit-source-id: f5cb105b4d7186a7eef705023382ec1383b6ec21
2017-05-03 15:10:34 -07:00
80c0a8776b Fix #1447: sparse_mask doesn't make sense with uncoalesced tensors (#1458)
* Make sparseMask error if mask is uncoalesced.

Fixes #1447.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add test for sparse adagrad.

Previously, the sparse codepath was not exercised at all; this commit
adds a very simple test case "sparse Rosenbrock"; the idea is to do
Rosenbrock but then knock out one of the dimensions so that the
tensor is sparse.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 17:53:45 -04:00
4ec0435b39 Report overall size of sparse tensors. (#1461)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 17:51:56 -04:00
1a831ce8f2 Add direct enqueuing to enable RNN input, allow specify batch columns
Summary:
Add a parameter dont_rebatch to data_workers. This disables batching of input from fetcher to equal-batch size chunks. This is not desired with RNNs where with longer sequence length we might want to have smaller batches etc.

For some reason the graceful-shutdown test interfered with other tests, so I removed it.

Reviewed By: jay-mahadeokar

Differential Revision: D4988549

fbshipit-source-id: cbab46d77c948f2e293e79e6eb538dde17d800ee
2017-05-03 14:49:44 -07:00
f8be3a20d3 Fix scatter_ documentation typo. (#1463) 2017-05-03 17:31:04 -04:00
7b21b0b6d7 Retry on write EINTR in sync mode
Summary:
We weren't handling an edge case where write(2) would return EINTR
when in sync mode. The Pair::write function would return false
indicating it didn't complete the write whereas the send function
expects it to complete when in sync mode. With this change we now
advance the cursor and retry the write when fewer than expected bytes
were written.

Also see https://github.com/facebookincubator/gloo/issues/34

Reviewed By: andrewwdye

Differential Revision: D4996949

fbshipit-source-id: 3bad4fa3d0a01517f20b64904aa71410641fa60f
2017-05-03 14:26:26 -07:00
16821bc45d Add ScatterWeightedSum for GPU.
Summary:
- Adding ScatterWeightedSumOp for CUDA.
- This version does not support input weight (weight0). In other words, the input weight has to be 1.0, otherwise the op exits.
- To check the value of weight0, we copy its value from device to host at: https://github.com/caffe2/caffe2/pull/443/files#diff-2a77f80797072e8443f4867cb709fb40R244
Closes https://github.com/caffe2/caffe2/pull/443

Reviewed By: akyrola

Differential Revision: D4971910

Pulled By: asaadaldien

fbshipit-source-id: 2282e968f95364f0b3b8126502b053fe7a32ba20
2017-05-03 13:40:48 -07:00
0910e0ac90 Fix memory leak in coalesce. (#1460)
Fixes #1449.

For future reference, we should have a doc explaining our ref-counting
conventions; it looks like this bug slipped by because we assumed that
newTensor was taking ownership of the pointers it was passed in.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 13:29:39 -04:00
93094294ba function backward attempted to multiply tuple by variables (#1459)
One line fix--changed it to multiple the grad_variables by the
len(variables) when grad_variables is None.
2017-05-03 13:12:21 -04:00
ddc4d101ad MultiRNNCell (Caffe2)
Summary: Add Python support for arbitrary (unidirectional) recurrent networks with MultiRNNCell abstraction. Since the combined step net for all layers is created at one time (in method _apply), this may be optimizable as-is. LSTM() function is extended to accept a list of numbers of units for the dim_out argument, producing a multi-layer LSTM in that case.

Reviewed By: salexspb

Differential Revision: D4965001

fbshipit-source-id: 39c069468d5b40bf803503cf62046a479ca83cbb
2017-05-03 10:02:31 -07:00
ff1330192c auto -> return type for C++11 support
Summary: Builds are breaking https://travis-ci.org/caffe2/caffe2/jobs/228149040

Reviewed By: Yangqing

Differential Revision: D4992774

fbshipit-source-id: bea4132db9c2bf24342887a2bc4cbd6225a5ce9a
2017-05-03 09:08:50 -07:00
743e4894d2 Prefix values/indices/sparse_mask/nnz with underscore (#1457)
As discussed in #1441.

I also added some docs giving clear guidance about how to coalescing
in sparse tensors.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 11:14:10 -04:00
f273377d19 add device asserts in scatter/gather kernels 2017-05-03 11:12:26 -04:00
836332e0a1 Merge commit 'f1591fade5c8df5272b79ab1bd8b0b261bb5606a' 2017-05-03 11:11:43 -04:00
f1591fade5 add device asserts in scatter/gather kernels 2017-05-03 11:10:31 -04:00
2e7635b929 Add flexible bilinear upsampling aspect ratio redux (#1317) 2017-05-03 08:46:28 -04:00
e9953c4595 A number of post-merge fixes for test_sparse (#1444)
* Simplify _gen_sparse

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Randomly generate an uncoalesced tensor and test with it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Simpler implementation of cpu_only suggested by @apaszke

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Better implementation of randn, suggested by @soumith

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Lint fix.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix CUDA type error.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 08:43:03 -04:00
72e8190994 Use at most one shared_ptr block at a time to manage THPFunctions (#1454)
* Fix failing ln in build_all.sh

* Use at most one shared_ptr block at a time to manage THPFunctions
2017-05-03 08:15:36 -04:00
e1278d4ee2 Fix typo in autograd docs 2017-05-03 03:11:55 -07:00
aadad971e4 Fix pybind11 module name for MPI helpers
Summary: TSIA

Reviewed By: dzhulgakov

Differential Revision: D4981136

fbshipit-source-id: 62d0df8dccea0a3ecb6da150eea4752b100c04a8
2017-05-02 23:18:50 -07:00
3ca0de25da Prevent false overwriting of a field
Summary: The code snippet below is invalid in the add unit test is invalid but it may or may not cause exception. Disable the syntax so people don't accidentally use it.

Reviewed By: dzhulgakov

Differential Revision: D4985030

fbshipit-source-id: ffa2b26f7b29128b196aba1b1001a97c87e381cf
2017-05-02 23:18:49 -07:00
31643d5ecb Inference code for seq2seq model
Summary: Beam search implementation

Differential Revision: D4975939

fbshipit-source-id: 67d8b73390221583f36b4367f23626a2aa80f4b4
2017-05-02 22:47:28 -07:00
3504e1d836 cuda (sparse) lengths sum
Reviewed By: azzolini

Differential Revision: D4961327

fbshipit-source-id: 4ee61dcdd907c044876cb0de671ceee953c15129
2017-05-02 22:21:42 -07:00
379ac514b8 lstm_benchmark: add warm-up stage, support layers
Summary:
We need a warm-up stage because otherwise first iteration
speds too much timedoing all the allocations

Reviewed By: akyrola

Differential Revision: D4986201

fbshipit-source-id: f60a75520988ff3f1540bb157cdc69634f307db4
2017-05-02 20:34:00 -07:00
22d4eaeb9e JoinContext
Summary:
Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context.

Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN.

Reviewed By: kennyhorror

Differential Revision: D4964949

fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202
2017-05-02 17:32:26 -07:00
66bd200de0 bug fix - add previous slot offset to calculated slot value in halving-doubling algorithms
Summary: Previous slot offset was not added to the calculated value for the slot to be used in halving-doubling algorithms. If multiple instances were running, slot values could collide.

Reviewed By: pietern

Differential Revision: D4986618

fbshipit-source-id: 56b9220c91f31cc016d37e82907221460de70657
2017-05-02 16:19:55 -07:00
282298dd1c Data parallel model: Disable NCCL by default to hopefully reduce deadlocks
Summary: Make NCCL optional in data_parallel_model due to continuing reliablity (deadlock) issues.

Reviewed By: pietern

Differential Revision: D4988950

fbshipit-source-id: 8a2192f01b5f3c0e847137cd37aefc69e553a56f
2017-05-02 16:09:17 -07:00
ee7b3c9b2b caffe2: rebatching queue for MultiTask
Summary:
RFC. This is a naive implementation of Rebatchin Queue for MultiTask
effort. Full disclaimer, I'm very new to Caffe/Machine Learning and I'm doing
dodge science here (under Dmytros supervision), so please be extra tough on
this review so I
can learn best practices :)

Differential Revision: D4871970

fbshipit-source-id: 924820ef0fce45b5e2bdabeec9885cbafa23a880
2017-05-02 15:22:46 -07:00
222b781f76 Ensure sparse_gradients feed to CPU
Summary: Ensure sparse gradients tensors are copied to CPU

Reviewed By: dzhulgakov

Differential Revision: D4987701

fbshipit-source-id: 81f93c4f9d4b9bc5855cd4e9683d1a887b27e0cf
2017-05-02 15:01:26 -07:00
574cfe3cf3 Improve kthvalue documentation. (#1448)
1) Fix "kth" attr specification -- I can't get sphinx to generate `k`th,
but `k` th works with a space, unlike now where the highlighting continues
until the next attr.
2) Specify the size of the return tensors.
3) Add an example of the return tensor sizes with more than 1 dimension.
2017-05-02 17:22:02 -04:00
699755e04f Convert contiguous() call in adagrad to out-of-place coalesce. (#1446)
We missed this one in f2903332c7dce1fbb7d7d9f18dcfba8e853581df!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-02 16:51:54 -04:00
fb07914c0c Recommendations for workflow when modifying C files. (#1443)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-02 15:46:45 -04:00
aa2ee86375 pytorch/thpp ~= facebook/thpp (#1445) 2017-05-02 15:46:10 -04:00
ecd51f8510 docs fixes 2017-05-02 15:42:33 -04:00
e8e36945cf make debug message more explicit & verbose
Summary: I ran into this earlier and the debug messages were not helpful enuogh

Reviewed By: kennyhorror

Differential Revision: D4985754

fbshipit-source-id: b3d12b5e2cfa1b54fca9126768c84c902664ef28
2017-05-02 12:39:14 -07:00
5aa1f769d3 Fix torch.dist documentation: function returns a float. (#1440) 2017-05-02 14:38:48 -04:00
1f3c7f8080 Handle net.external_inputs correctly in AppendNet
Summary:
When appending net A to net B, an external input of net A should not be added as
an external input of net B if net B is outputting that blob.

Reviewed By: dzhulgakov

Differential Revision: D4975921

fbshipit-source-id: a5c0ada7b96d851e57d345244d322dd93c7be8e4
2017-05-02 11:20:26 -07:00
da338ca821 Fix Caffe2 LoadOp docs
Summary: Better documentation for the add_prefix argument.

Reviewed By: ender-wieczorek

Differential Revision: D4973963

fbshipit-source-id: 7c238bed05d04195a9d188548a07859a2095fab9
2017-05-02 11:04:21 -07:00
e8e93066e7 add workflow for user complicated embedding
Summary: Correctly propagate request_only tag to all layer.

Reviewed By: kennyhorror

Differential Revision: D4751496

fbshipit-source-id: e65fd8cfe56d2989213d44e684a528ede691d316
2017-05-02 10:46:52 -07:00
eecc807a75 Keep track of number of in-flight send operations
Summary:
This helps guard against programming errors where waitSend is called
before send is called. It uses a std::atomic to keep overhead low.

Reviewed By: andrewwdye

Differential Revision: D4984604

fbshipit-source-id: 04a63b1ba088e3bcba0abff40771af666deb15e5
2017-05-02 09:35:46 -07:00
a458aa4b2a Fix tags to be based on EXCLUDE_FROM_{CONTEXT}
Summary: Cleaning up the tagging system. Introducing tags EXCLUDE_FROM_{CONTEXT}.

Reviewed By: kennyhorror

Differential Revision: D4974842

fbshipit-source-id: b0fa6772299bb70afa2192c39e45191c9f41336a
2017-05-02 09:32:27 -07:00
5386012164 Check return value of ibv_reg_mr for error
Summary:
This returns EFAULT when passing a GPU memory pointer (for GPUDirect)
and the ibverbs driver can't map the GPUs memory. Since the error is
pretty cryptic, crash with a more useful message.

```
terminate called after throwing an instance of 'gloo::EnforceNotMet'
  what(): [enforce fail at gloo/transport/ibverbs/buffer.cc:46] mr_ !=
  nullptr. ibv_reg_mr: Bad address (kernel module 'nv_peer_mem' not
  loaded; did you specify a GPU pointer?)
```

Reviewed By: andrewwdye

Differential Revision: D4982966

fbshipit-source-id: 72c220fe22a3bc59396cfff992ad5f0f9c5bf83a
2017-05-02 09:11:15 -07:00
4bf813e068 Document cdata non-NULL invariant, and consequence Python side. (#1435)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-02 11:17:20 -04:00
3b4bc721ef fix osx build and suppress clang warnings (#1432) 2017-05-02 09:33:24 -04:00
7d6d67119f Allow LayerModelHelper to keep input blobs from schema
Summary: In certain situation, like in D4907916 where we insert additional step in the middle of a model, it's neccessary to keep the blob names constant across model helper so that it doesn't break communication schema.

Reviewed By: kennyhorror

Differential Revision: D4981527

fbshipit-source-id: 6b8d6d240279dd48f801cfacbaa1d320ba54d694
2017-05-01 21:31:36 -07:00
58bc830660 Integrate CRF in DeepText + New caffe2 operator for viterbi decode
Summary: Inegration of the CRF Layer in DeepText wordmodels + Implementing the viterbi decode operator in C++ instead of python so that the CRF models can be deployed in production.

Differential Revision: D4912196

fbshipit-source-id: 64f499a1bd47e811e7a96dde839904dcd05cacb3
2017-05-01 20:39:41 -07:00
38d3bfa5d4 Warn on setting blob on Scalar
Summary: Calling `set()` or `set_value()` on Scalar is dangerous as something might be holding a reference to it. This is especially true with `LayerModel`, where instantiation is delayed. The code may still run but it will produce unexpected results, i.e., values maybe written to the wrong blob.

Reviewed By: kennyhorror

Differential Revision: D4955366

fbshipit-source-id: f5e8694a9a411ee319ca9f39a0fed632d180b8a5
2017-05-01 20:18:30 -07:00
c86610b738 special executor class for RecurrentNetworks (just single threaded now)
Summary:
This is preamble for the "diagonal executor". Instead of creating a Net for each timestep, we have a single executor for the RecurrentNetworkOp that manages ops per timestep.
This will be used if net_type='rnn', so one can still use the old way by using a net type of 'simple' or 'dag' (so there is effective kill-switch if there are some issues with this).

Did this only for the forward-model. Gradient op will follow later on, but it is basically similar, just reverse order.

Reviewed By: salexspb

Differential Revision: D4979933

fbshipit-source-id: bda77918ec518cb6b29d7021ee036d59eb2dd303
2017-05-01 19:06:25 -07:00
dca208b525 Refactor test_sparse to reduce boilerplate. (#1421)
* Refactor test_sparse to reduce boilerplate.

Instead of manually creating a helper function, threading an is_cuda
parameter around, and creating a test method for CUDA and non-CUDA
variants, we take a different approach:

- There is now some new member variables initialized in setUp which
  control the aspects of how we carry out the test; at the moment,
  it's just whether or not we are using CUDA or not.  This means
  you don't have to pass is_cuda around, or do a conditional to
  get the triplet of constructors you need.

  I'll note that I am not a big fan of member variables in test
  objects, but these are (intended to be) immutable so I think
  it should be OK.

- Instead of manually defining test_foo and test_foo_cuda, we now
  have a new TestCudaSparse class which overrides setUp (from above)
  to swap in the CUDA implementation.  Way less boilerplate, and NO
  metaprogramming needed.

  If you need to opt out of CUDA testing, there is a new cpu_only
  decorator you can use.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-01 21:52:58 -04:00
181cb15c72 Fix formatting error in docs.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-01 21:47:22 -04:00
7df8fbb64f Generalize halving-doubling to support non-power-of-two cases using binary blocks algorithm
Summary: A generalized version of halving-doubling that supports non-power-of-two number of processes by breaking up execution into blocks that are powers of two and communicating interblock after the intrablock reduce-scatter. Non-power-of-two cases will have some degree of load imbalance compared to power-of-two, but cases with few large blocks (e.g. 8 + 4 or 16 + 8) should still perform relatively well.

Reviewed By: pietern

Differential Revision: D4955947

fbshipit-source-id: af4f218fedb6adf475530c38386978b81f4f2b74
2017-05-01 16:05:22 -07:00
f0dd96c116 brew fc test fix for packed_fc
Summary:
It turned out that we can not run PackedFC on a machine that does not have avx2 right now, as there is an known issue with MKL 2017.0.098 that produces wrong results on non-avx2 machines.
I just moved this test from here because this is not the purpose of this test

Reviewed By: salexspb

Differential Revision: D4974021

fbshipit-source-id: c5b82a41021defc9946a8219f59b28abb13d3beb
2017-05-01 14:33:05 -07:00
5c7453447f Fix bugs, rename differentiate to grad, make it more flexible 2017-05-01 16:44:56 -04:00
87164f554d Bug fixes 2017-05-01 16:44:56 -04:00
267e7c0431 Fix memory issues with Conv and BatchNorm 2017-05-01 16:44:56 -04:00
e5db8f98be Add torch.autograd.differentiate 2017-05-01 16:44:56 -04:00
20aa5b066f Convert some of the functions to new format
Also, fix a lot of issues that appeared after the previous commits.
2017-05-01 16:44:56 -04:00
de9998e198 Add support for the new Function format 2017-05-01 16:44:56 -04:00
702a2e3bc5 Make Variables not subclass Function anymore
Because of this Variables can no longer appear in the graph.
Every usage of a leaf Variable will leave an AccumulateGrad
function that has no outputs, but modifies var.grad as a side
effect.
2017-05-01 16:44:56 -04:00
2ca787fcf4 Refactor attribute names in autograd 2017-05-01 16:44:56 -04:00
2ec629bef9 Set SO_REUSEADDR to try and prevent bind errors
Summary:
After running the test suite many times we end up with a zillion
connections in TIME_WAIT state. Setting SO_REUSEADDR seems like it
should help binding to ports regardless of the TIME_WAIT state.

Reviewed By: andrewwdye

Differential Revision: D4979606

fbshipit-source-id: b611f9c9e11aba858dc192f6bca3d64e10100b52
2017-05-01 13:36:14 -07:00
2197e4c766 version bump 2017-05-01 15:54:52 -04:00
2a28283680 Fix pair destructor if in CONNECTING state
Summary:
It can happen that a pair is destructed while in CONNECTING
state when some unrelated code throws an exception after the connect
function has been called. The most likely place for this to happen is
when connecting pair A is in progress while connecting pair B throws
an exception. The exception will force destruction of all references
to pair A, even if it is in the CONNECTING state.

Also see https://github.com/facebookincubator/gloo/issues/33

Reviewed By: andrewwdye

Differential Revision: D4979557

fbshipit-source-id: 0cddddd3f478106f1694603fe7f2efe15a2d9aa1
2017-05-01 12:41:07 -07:00
ffc6bad116 Concat axis=0
Summary: Previously, the code below would go out of bound.

Reviewed By: xianjiec

Differential Revision: D4968037

fbshipit-source-id: 3760e2cddc919c45d85ac644ac3fabf72dbaf666
2017-05-01 12:19:34 -07:00
1040b5f91c Enable bitcode for iOS builds
Summary:
build_ios.sh now have `-fembed-bitcode` flags for cmake and passes these flags to build_host_protoc.sh (which now accepts optional argument `--other-flags`). That allows to use output libs (libCaffe2_CPU.a, libCAFFE2_NNPACK.a, libCAFFE2_PTHREADPOOL.a and libprotobuf-lite.a, libprotobuf.a respectively) in Xcode projects with bitcode enabled.

Bitcode is enabled by default in all projects since Xcode7, is crucial for slicing and is mandatory for watchOS targets. Enabling bitcode for target requires bitcode to be enabled for all dependencies also, so Caffe2 built without bitcode forces developers to switch off bitcode for the whole app.
Closes https://github.com/caffe2/caffe2/pull/457

Reviewed By: bwasti

Differential Revision: D4978644

Pulled By: Yangqing

fbshipit-source-id: 5165abb507fb91bc8c38f7348d6836bccf8fcc22
2017-05-01 10:32:11 -07:00
561255218a NormalizeOP CUDA impelementation
Summary:
Implement NormalizeOP for GPU using CUDA, and re-write the graident to be a function of the output
so its more efficent specially for CUDA implemntation.

Reviewed By: akyrola

Differential Revision: D4971300

fbshipit-source-id: e0ab66462000988aaf1f26010ea550533d107167
2017-05-01 09:25:30 -07:00
4624278b1d Make sparse documentation title consistent with others. (#1420)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-01 11:48:00 -04:00
79d4ac670c Add map_location to load_url (#1418) 2017-05-01 10:21:30 -04:00
4ebf3ff46d Add base for CUDA allReduce and broadcast in DataChannelGloo 2017-05-01 01:49:10 -07:00
ac3ba9a2ad Rebase fixes 2017-05-01 01:49:10 -07:00
14e1bfddbc Change warning message in MPI 2017-05-01 01:49:10 -07:00
c19fbd3364 Update comments; Add inline accessors for value_type tuple in GlooCache 2017-05-01 01:49:10 -07:00
a17d96d571 Add multiple thread support for DataChannels
Previously, when using same data channel in multiple thread environment,
one didn't have any guarantee that there won't be any deadlocks
or even errors.
2017-05-01 01:49:10 -07:00
b7dcc29430 Forward declare GlooCache key_type 2017-05-01 01:49:10 -07:00
18b4dcd28b Remove unused variable in macro 2017-05-01 01:49:10 -07:00
be81304d27 Moved GlooCache to new file; Functions renames; Minor fixes 2017-05-01 01:49:10 -07:00
f07f13c6e9 Change Store exception handling 2017-05-01 01:49:10 -07:00
310d08c37b Fix store and all operations 2017-05-01 01:49:10 -07:00
234df2138a Fix compilation errors 2017-05-01 01:49:10 -07:00
2b340e7d50 Add python tests; Remove broken prefix store creation 2017-05-01 01:49:09 -07:00
6888c61fa8 Fix DataChannelGloo compilation 2017-05-01 01:49:09 -07:00
ba3328b365 Add DataChannelGloo tests 2017-05-01 01:49:09 -07:00
3b4fe5dfc4 Add isend/irecv; Add all types generator for template functions; Minor refactor 2017-05-01 01:49:09 -07:00
ce42761628 Add groups 2017-05-01 01:49:09 -07:00
df4791d6c0 Implement DataChannelGloo 2017-05-01 01:49:09 -07:00
7e8830c3d5 Initial gloo bindings 2017-05-01 01:49:09 -07:00
b91cec7f66 Fix THD library build for CUDA 2017-05-01 01:49:09 -07:00
765aeb1a08 Fix nonzero bug 2017-05-01 01:49:09 -07:00
280e2a94e5 Worker init clarification; Inform on error thread notification failure 2017-05-01 01:49:09 -07:00
e7f453b5de Add barrier to test; Minor changes; 2017-05-01 01:49:09 -07:00
8030aa0f1b Refactor error thread 2017-05-01 01:49:09 -07:00
40ad2cde62 Remove unnecessary nonzeroElems function 2017-05-01 01:49:09 -07:00
af4a978c44 Move error thread to CommandChannel; Minor fixes; 2017-05-01 01:49:09 -07:00
fe5fc6723f Remove unnecessary code 2017-05-01 01:49:09 -07:00
6e6179633b Minor fixes in THDMasterWorkerInit 2017-05-01 01:49:09 -07:00
c97e60c45d Add actual error reporting in Master 2017-05-01 01:49:09 -07:00
2cdb368f97 Add error handling in MasterWorker mode 2017-05-01 01:49:09 -07:00
a5b2f3461a Review fixes 2017-05-01 01:49:09 -07:00
d3e60599d2 Add benchmark scripts (#66) 2017-05-01 01:49:09 -07:00
98d8e0b040 Lapack functions implementation #2 + fixes after review 2017-05-01 01:49:09 -07:00
fe2c360eda Lapack function implementation #1 2017-05-01 01:49:08 -07:00
59ae109bbb Implement functions from set 1 (except Lapack) 2017-05-01 01:49:08 -07:00
8623076654 Add convertToRank to do bound checking 2017-05-01 01:49:08 -07:00
a362b4f367 Add support for unsigned char aka byte to MPI 2017-05-01 01:49:08 -07:00
ef724e355c Change rank type: int -> std::uint32_t; Minor fixes 2017-05-01 01:49:08 -07:00
e863d27393 Tweaks, fixes, cleanup in DataChannelTCP 2017-05-01 01:49:08 -07:00
4c388f9398 Revert structure changes; Minor fixes 2017-05-01 01:49:08 -07:00
6740d1d904 Rewrite CommandChannel 2017-05-01 01:49:08 -07:00
f891d9b1bf Don't build tests by default 2017-05-01 01:49:08 -07:00
a81f330854 Rename construct -> new; Minor fixes 2017-05-01 01:49:08 -07:00
c02241edbd Minor code refactor 2017-05-01 01:49:08 -07:00
f30a92fa17 Fix invalid socket initialization 2017-05-01 01:49:08 -07:00
1391ff99f4 Use TCP_NODELAY for data sockets 2017-05-01 01:49:08 -07:00
43019bd88a Always loop over all possible addresses in worker 2017-05-01 01:49:08 -07:00
d6380910f5 Removed unnecessary code; Minor fixes 2017-05-01 01:49:08 -07:00
04491e84e4 Fix build with CUDA 2017-05-01 01:49:08 -07:00
e247249a5f Implement TH_API functions from the set 4 2017-05-01 01:49:08 -07:00
2c59f017e6 Port Xray OC workflow to elastic_data_parallel_model
Summary: As in the title + added scuba logging of the results.

Reviewed By: andrewwdye

Differential Revision: D4974261

fbshipit-source-id: 3e05b97133be95ffe37c8bcafd8a5a6bf3e7da93
2017-05-01 00:32:47 -07:00
0160438eb9 added logical not operator for ByteTensor (#1403) 2017-04-30 08:47:24 -04:00
7dd8571bc6 fix avg_pool docs in nn.functional 2017-04-30 08:44:43 -04:00
48a7869b23 Doc fixes (#1409) 2017-04-30 08:28:19 -04:00
582fd3db7d fix osx build 2017-04-29 09:29:57 -04:00
9169f60a84 Parallelize TensorMethods.cpp builds (#1400) 2017-04-29 09:07:21 -04:00
457d78a7d9 Use THCUNN backward kernels for Tanh and Sigmoid in Autograd (#1399) 2017-04-29 09:07:03 -04:00
a071ccbea6 fix NCCL makefile for CUDA 7.5 (#1401) 2017-04-29 09:04:01 -04:00
db1eb66456 corrected docstring for Dropout (#1404) 2017-04-29 13:40:47 +02:00
6e1333fe92 CUDA operators for DotProduct and DotProductGradient
Summary: Only CPU impl is available at the moment. Wrote simple cuda kernels.

Reviewed By: akyrola

Differential Revision: D4577736

fbshipit-source-id: c2540aa9d332fcdeac46cc7f89aab164d107d7a8
2017-04-28 19:47:00 -07:00
d223d71703 Add shape inference function for RoiPool.
Summary: As the title.

Reviewed By: akyrola

Differential Revision: D4960241

fbshipit-source-id: d5f7d7c2eea72a75f810aa2f532965fff48f8388
2017-04-28 17:03:29 -07:00
45020a74cd remove inplace pow and fix contiguous -> coalesce (#1398) 2017-04-28 18:26:29 -04:00
9c01f5d6b2 Document hybrid sparse tensors.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-28 23:53:01 +02:00
2b0dbad3df Support fp16 output from ImageInputOp
Summary:
Refactors out option -> datatype cast
Allows native fp16 output from GPU transformation path
Expanded CastOp to allow fp16 in/out on GPU
Closes https://github.com/caffe2/caffe2/pull/400

Reviewed By: akyrola

Differential Revision: D4948721

Pulled By: asaadaldien

fbshipit-source-id: f3b0b6d545e58a8f204857e5bdb41086aac731b8
2017-04-28 14:50:47 -07:00
cbb9f08b71 Add new init methods gain, eye and dirac (#1172) 2017-04-28 17:16:40 -04:00
f75ab857b8 Add safeCoalesce() to tests 2017-04-28 17:11:05 -04:00
f2903332c7 Make coalesce() out of place 2017-04-28 17:11:05 -04:00
9643be76f9 speed up accumulation 2017-04-28 17:11:05 -04:00
4f09461d24 Rename sparse tensor contiguous() to coalesce() 2017-04-28 17:11:05 -04:00
bafb2e5cc2 Implement sparse pow. (#1387) 2017-04-28 23:06:09 +02:00
28a7fbbdf5 Documentation fix for torch.gather 2017-04-28 22:45:14 +02:00
4c1cdb6148 Refactor Python string utility function 2017-04-28 21:25:26 +02:00
ed05c28bc6 Speedup SquaredL2Distance CUDA
Summary: Both SquaredL2Distance and SquaredL2DistanceGradient had bad CUDA implementations. Use proper reductions and batched kernels.

Reviewed By: asaadaldien

Differential Revision: D4968527

fbshipit-source-id: f7cf82072d38bc127c757c5751863a9439aca8b5
2017-04-28 11:55:59 -07:00
775481ed56 re-enable dilated convolutions on Kepler (#1394) 2017-04-28 14:42:19 -04:00
5b2aac7c73 Merge commit '224f5eabf5cfb3a19abc1819f7dac230500b6bdb' 2017-04-28 13:48:06 -04:00
224f5eabf5 half<->float conversion cleanup (#680) 2017-04-28 19:46:42 +02:00
fd490c6490 Merge commit 'd6a31c68a0f39656257322a55c9e04dd579de828' 2017-04-28 13:42:23 -04:00
d6a31c68a0 Add option to disable ppc64le's VSX support
Set environment variable TH_NO_VSX=1 to disable VSX.
2017-04-28 13:41:03 -04:00
96a281dfab Add one more missing self.dilation parameter. (#1392)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-28 19:16:32 +02:00
94b147fd41 Allows dicts batches in dataloader. (#1354)
* Allow dicts in Dataloader

* use collections.Sequence instead of collections.Iterable in dataloader
2017-04-28 19:14:52 +02:00
6bb43ee41e leaky relu gradient op
Summary: Implement CPU and GPU gradient for Leaky ReLU op.

Differential Revision: D4943905

fbshipit-source-id: 541f13cd5f274a18b69ecf1362722b1bc0105ad9
2017-04-28 10:06:23 -07:00
c26f6877a0 guard topk for half (#759) 2017-04-28 11:57:15 -04:00
8908000262 function -> lambda in test 2017-04-28 10:31:40 -04:00
8b1d5727d8 fix minor docs 2017-04-28 10:13:52 -04:00
75f1989bec Add nn.Bilinear and tests 2017-04-28 10:11:30 -04:00
e221536ad8 Merge commit 'a44317fea88adddded91e068088415de1e66fd4b' 2017-04-28 08:04:39 -04:00
a44317fea8 Change magma_sgesvd to magma_sgesdd which is significantly faster 2017-04-28 08:03:39 -04:00
24e5a9057e Revert "Parallelize TensorMethods.cpp builds (#1364)" (#1390)
This reverts commit 060048bcd808893ba3113d09273a42642904078a.
2017-04-28 07:59:40 -04:00
060048bcd8 Parallelize TensorMethods.cpp builds (#1364) 2017-04-28 07:45:21 -04:00
77035d151e make topk test unique 2017-04-28 07:30:25 -04:00
50c9c23525 enable topk for all cuda 2017-04-28 07:14:21 -04:00
3f81803b09 Merge commit '69574a6dc4036b0113c512a1b2d74e23682c8a3b' 2017-04-28 07:08:43 -04:00
d421c473a9 Merge commit '928f6516c16ff91c0a789d0a653551041d1bafd0' 2017-04-28 07:07:24 -04:00
48f9e526ea implement expand/expandAs in CPU/GPU code 2017-04-28 07:06:25 -04:00
69574a6dc4 implement expand/expandAs in CPU/GPU code 2017-04-28 07:04:08 -04:00
928f6516c1 implement expand/expandAs in CPU/GPU code 2017-04-28 07:03:51 -04:00
b93b525a1c Enable specifying of margin in HingeEmbeddingLoss (#1378)
Previously it was not possible to set a value for the margin in the HingeEmbeddingLoss in the constructor. This patch fixes the issue and makes the loss behave as it is described in the docs. 

A discussion of this issue can be viewed here:
https://discuss.pytorch.org/t/issue-with-setting-margin-for-hingeembeddingloss/2088
2017-04-28 06:58:48 -04:00
482ffccd76 Make instance norm grad test less flakey
Summary:
Instance norm failed grad check in some cases that needed a smaller step size.  Decreased step size, but also increased threshold slightly.

Related diff: D4627379

Reviewed By: kennyhorror

Differential Revision: D4941827

fbshipit-source-id: d6f565340da92af40bfee90627960a3356c69412
2017-04-27 22:35:10 -07:00
726ded4758 add box cox transform op
Summary: as desc

Reviewed By: kittipatv

Differential Revision: D4949042

fbshipit-source-id: 06b8828d8fbe2a88f6798c5d19a702ebaf6def70
2017-04-27 22:06:43 -07:00
bf50599c70 Layered LSTM (naive version)
Summary:
This is a naive layering approroach till we have a better
one. It could be c++ based and support diagonal execution. Not integrating into main LSTM API yet as this might be revised a bit. Would like to land so we can compare current implementation in the benchmark and also use this as an example of how LSTMs could be combined (as some folks are doing similar things with some variations).

Later we can LSTM() support API of layered_LSTM() and also change it under the hood so it stacks cells into a bigger cell instead. This way if we make RNN op use a kind of a DAG net, then RNN op can provide more parallelizm in stacked cells.

Reviewed By: urikz

Differential Revision: D4936015

fbshipit-source-id: b1e25f12d985dda582f0c67d9a02508027e5497f
2017-04-27 19:16:58 -07:00
8db2cf6182 temp fix for transposed dilated convolution (#1388) 2017-04-28 02:53:37 +02:00
aa5a46b848 fix LRN order
Summary: fix LRN helper's order

Reviewed By: salexspb

Differential Revision: D4949902

fbshipit-source-id: 88b1aa985546d36aa66c0677c617979ff186d78a
2017-04-27 16:46:47 -07:00
bc3ec13dae change topk operator to use a priority queue
Summary: Use a priority queue instead of std::partial_sort to identify the top k elements. This reduces memory usage and improves performance.

Differential Revision: D4963931

fbshipit-source-id: 02e75b17ffaf24a4f63c7136626bf0991ee47495
2017-04-27 15:07:31 -07:00
7e8ef0e22a Actually pass dilation to the underlying operators. (#1386)
No tests for now; we'll need some sort of shape DSL to concisely
represent them.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-27 23:38:01 +02:00
12a024241a Move BeamSearchForwardOnly to OSS
Summary: Step 1 for inference code in OSS

Differential Revision: D4960547

fbshipit-source-id: 4c3121e5cb3c2402be08947c1e1afa0dd6eb921a
2017-04-27 13:35:53 -07:00
27990fee54 Use fully qualified name as tp_name for tensors and storages (#1379) 2017-04-27 16:26:44 -04:00
1aadf4324b Add row-wise broadcasting to "Where" operator
Summary: Add row-wise mode to `Where` (D4901402), similar to `RowMul`.

Reviewed By: ender-wieczorek

Differential Revision: D4928221

fbshipit-source-id: 3443e559cd366e48c2f6a3f379aeefb7921264ee
2017-04-27 12:31:54 -07:00
ad6204eb0b LSTM: support dropping hidden / cell states when sequence
Summary:
This is useful when data has standalone sequences which are
not connected to each other by any meaningful context

Reviewed By: yqwangustc

Differential Revision: D4835164

fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83
2017-04-27 11:47:29 -07:00
c4ce118393 fix curand odd-sized workaround
Summary: Ran into illegal memory access errors when running MSRAFill on an odd-sized tensor.  curand only supports even-sized fills.  To workaround this limitation, we fill the last entry of the tensor manually and use curand for what remains.  In this line, the intent is to get the n-1 th element of the tensor.  r is already a T*, so we should not be multiplying by sizeof(T) to get the n-1 th element.

Differential Revision: D4961306

fbshipit-source-id: 587f2945abf025e28f573482a4828c09e6ae771b
2017-04-27 01:18:19 -07:00
e9d5863860 Allow Load operator to load into overriden names
Summary:
A new argument `blob_name_overrides` is added, which is to specify the
destination of loaded blob (in order to allow they have different names than
what are in the saved file/db).

This will be used for parameter initailization by pretrained model
in Dper 2. When loading a blob, we need to avoid name collision by assigning the
loaded blob with a new (temp) name.

Reviewed By: xianjiec

Differential Revision: D4952485

fbshipit-source-id: 4ce79bf40223314bb94981c22cbe537ae3f3d27c
2017-04-27 01:18:12 -07:00
2ef7331007 Update sparse.py 2017-04-27 02:25:00 +02:00
beb7573e5c workflow support for training regression/weighted logistic regression model.
Summary: workflow support for training regression/weighted logistic regression model.

Reviewed By: xianjiec

Differential Revision: D4830130

fbshipit-source-id: ccd4fc47a0d4b7c4ffb5948766c4a00ac34f929b
2017-04-26 17:02:05 -07:00
c2cfa4cf5b Add THGenerate*Type.h for all types (#1014) 2017-04-27 01:11:56 +02:00
c915f8ddbf Signal error on connection error instead of asserting
Summary: No need to assert on connection errors.

Reviewed By: andrewwdye

Differential Revision: D4957698

fbshipit-source-id: b47f6f0f098dbf7d212701c5cb68e34b2c1c9522
2017-04-26 16:07:13 -07:00
6a1ef687f6 Free scratch blobs when data workers exits, add utility function to reset blobs
Summary:
Free scratch blobs at data workers exit. Also add utility function that you can use to reset gradient blobs easily:

    from caffe2.python import utils
    grad_blobs = [b for b in workspace.Blobs() if b.endswith("_grad") or b.endswith("_shared")]
    utils.ResetBlobs(grad_blobs)

Reviewed By: rpenggithub

Differential Revision: D4955531

fbshipit-source-id: d33b2bb2b5247dd2c4cff51c82b1257c871a4179
2017-04-26 13:40:13 -07:00
795dc1c326 Remove loss ops from eval net
Summary: Current eval nets contain loss operators; see example: https://fburl.com/6otbe0n7, which is unnecessary. This diff is to remove them from the eval net.

Differential Revision: D4934589

fbshipit-source-id: 1ba96c20a3a7ef720414acb4124002fb54cabfc7
2017-04-26 12:46:25 -07:00
b39a2f2cbb Documentation for sparse tensors. (#1366) 2017-04-26 21:43:05 +02:00
d9f01397b3 s/NOCUDA/NO_CUDA/
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-26 21:42:09 +02:00
8ca7bf2ab3 Check argument types in 'checkTypes' (#1363)
Fixes #1357
2017-04-26 15:00:41 -04:00
9215afef7d Allow stopping of specific data workers + specify c2 queue size
Summary: Now you can call coordinator.stop_coordinator("train") to stop the train model's data input and release its memory.

Reviewed By: rpenggithub

Differential Revision: D4955014

fbshipit-source-id: c1bc3ec67337b94aff8ea9b306c3b4158eeef42c
2017-04-26 11:18:40 -07:00
13bdd4ec05 Replaces the non-existing _param_init_net net by raising an exception.
Summary:
The _param_init_net does not exist. All the other places reference
param_init_net instead. So far no one has encountered any problem
because all the passed params are BlobReferences. This diff makes
this assumption explicit.

Reviewed By: azzolini

Differential Revision: D4922930

fbshipit-source-id: e6dbd7a29ea640b7e62fcfec7ced3cc7d149f872
2017-04-26 10:35:45 -07:00
9f9a2da1a1 Revert D4920719: [dper2][operator] ScaleGradientOp
Summary: This reverts commit 0e1e0888f79594be874fdbdda5ccef7389064c50

Differential Revision: D4920719

fbshipit-source-id: 1ca9dc329eaffeb2932267d631506bb124d4e7ae
2017-04-26 09:34:47 -07:00
c387704030 improve softmax-with-loss kernels for prob-mode
Summary: Yet another diff to improve softmax CUDA kernels. 1) Use CUB for reduction ProbCrossEntropyKernel (was sequential loop); 2) remove unnecessary inner for-loops for two other kernels.

Reviewed By: wickedfoo

Differential Revision: D4953099

fbshipit-source-id: 4a5806d450021eff84e3d7fb0e7020cb5013fd69
2017-04-26 09:05:56 -07:00
d8e7093857 Reimplement RowMaxKernel using CUB block reduction.
Summary:
My first CUDA kernel ever!

The general strategy:
1. Create a block per row, up per CAFFE_MAXIMUM_NUM_BLOCKS
2. Create a CAFFE_CUDA_NUM_THREADS to sum in parallel
3. Sequentially compute the max of all inputs for a thread
4. Use CUB parallel reduce to compute the overall max.

The new version of the code is way faster than the old kernel (20x). This is
actually quite suspicious; with the assistance of ntv, we discovered that
RowMaxKernelLargeD was performing slowly on lstm because it was only ever being
parallelized over a single block (see Test Plan below for a sample trace).
It will be good to investigate this further.

Differential Revision: D4948557

fbshipit-source-id: 7f8d5c04667b948881468adb37f8ebc5c903c8da
2017-04-26 09:05:56 -07:00
8950f41da3 Install CUDA headers.
Summary:
This PR makes cmake installs the gloo CUDA headers if USE_CUDA is enabled.
Closes https://github.com/facebookincubator/gloo/pull/29

Differential Revision: D4946856

Pulled By: pietern

fbshipit-source-id: a688c3794c4a5e34b664e7bdeb4e1148f6504419
2017-04-25 22:42:12 -07:00
e42c14e819 ScaleGradientOp
Summary:
ScaleGradient is a helper operator that does no actual numerical computation,
and in the gradient computation phase scales the gradient from being computed
through it.

Differential Revision: D4920719

fbshipit-source-id: 0e1e0888f79594be874fdbdda5ccef7389064c50
2017-04-25 21:46:45 -07:00
deb1327b6e Re-apply #266
Summary: Closes https://github.com/caffe2/caffe2/pull/404

Differential Revision: D4943280

Pulled By: Yangqing

fbshipit-source-id: c0988598d8ccb8329feac88382686324b90d4d46
2017-04-25 21:17:04 -07:00
b905166362 RNN: fix bug for parameter gradient in a case when SumOp is
Summary:
Issue is that AliasOp doesn't work well with swaps that we do for
param.grad and param.accGrad. Tensors become the same if there is no
reallocation of the gradient tensor inside the backward cell net's
local workspace.

bug explanation from  akyrola:

```
gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad: tensor A

on each timestap back to 0, we Alias
gpu_0/decoder/weighted_encoder_outputs_grad,
so then also

gpu_0/decoder/weighted_encoder_outputs_grad: tensor A

It's acc is:
gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor B

Now after timesteps, we swap (line 626) with _acc to get

gpu_0/decoder/weighted_encoder_outputs_grad: tensor B

gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A

OPTION A -- batch size is same as before or smaller:
Then on next iteration, we do again the Alias to
gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad, so now

gpu_0/decoder/weighted_encoder_outputs_grad: tensor A

and also

gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A

swapping them does nothing and they are the same

OPTION B -- batch size increases
gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad is reallocated,
becomes tensor C

gpu_0/decoder/weighted_encoder_outputs_grad becomes tensor C with
Alias

gpu_0/decoder/weighted_encoder_outputs_grad_acc: is tensor A

```

Reviewed By: urikz

Differential Revision:
D4946730

Tags: rnn, caffe2

fbshipit-source-id: b52d63cb238b81d2ad40e05e70deb32a81336f47
2017-04-25 20:46:59 -07:00
a4554bb705 elementwise ops + error handling
Summary:
New memonger (D4393909) has option to use shape inference. When trying this on some models, I encountered a couple of issues, fixed here:
 - elementwise ops Add, Div, Mul did not have shape inference, leading to errors
 - if shape inference function throws an error, it will crash the whole thing. It is better to catch the error, log it, and continue going on. Shape inference is not required, but an optimization.
 - additional checks to conv/pool shape inference function. This was segfaulting in certain cases.

Reviewed By: asaadaldien

Differential Revision: D4949994

fbshipit-source-id: d4c571e1bb20f8feeade95c49412771bb3e7bed0
2017-04-25 18:52:33 -07:00
da567dcb38 Add __syncthreads() between CUB reductions for elementwise linear gradient kernel
Summary: Thanks to ezyang, now I know that if a CUB tempstorage is reused, a thread sync is needed. So added this to the elementwise linear gradient kernel.

Reviewed By: wickedfoo, ezyang

Differential Revision: D4949250

fbshipit-source-id: fbbbd336a962a51be43784207105cadd391a8ef2
2017-04-25 17:32:18 -07:00
ef2701a57e MapToRange layer
Summary: A layer that takes raw ids as inputs and outputs the indices which can be used as labels. The mapping will be stored with the model.

Reviewed By: kittipatv

Differential Revision: D4902556

fbshipit-source-id: 647db47b0362142cdba997effa2ef7a5294c84ee
2017-04-25 16:03:58 -07:00
2c8b41e3f3 Adding add_weight_decay and image_input to brew module
Summary:
Adding add_weight_decay and image_input to brew module & remove `getWeights` and `getBias` from CNNModelHelper

With fbgs `useWeights`, the results show that noone but add_weight_decay is using this function. I checked with oculus people, their getWeights is a different function.

kennyhorror Please notice whether this is going to affect you :)

Reviewed By: salexspb

Differential Revision: D4945392

fbshipit-source-id: 4ef350fd81dd40a91847e9f3ebc5421eb564df32
2017-04-25 16:03:58 -07:00
885f906e67 resnet train print loss and accuracy
Summary: printing resnet training loss and accuracy for each batch so that people will have better idea of what is going on

Reviewed By: pietern

Differential Revision: D4945390

fbshipit-source-id: 0fcd60f4735e81641355aba6e6cbf0e57e886e38
2017-04-25 16:03:58 -07:00
5692969e8f add gradient for LengthsTileOp
Summary:
lengthTile goes from 1 to multiple, the gradient op is simply the reverse,
by adding up the fanned-out rows of gradients together into 1

Reviewed By: kittipatv

Differential Revision: D4943375

fbshipit-source-id: deae9984e849974a0d484a10b94efdb1d30941cc
2017-04-25 14:31:15 -07:00
f82a510be6 share forward activation blobs + pass unused free blobs down all branches + use shape infernece
Summary:
Added optional support for using activation blobs for sharing as well. Doing this change revealed an non-optimal implementation in the blob sharing: we need to prefer to reuse freeblobs by prefering those blobs that are already shared by many other blobs. Otherwise the memory usage can increase when the pool of 'free blobs' grows.

Also, my first version only passed "free blobs" (i.e blobs in recycling pool) down the first branch when operators forked. But now we pass those blobs that were not used by the first branch down the second branch and so on.

Also added support for blob size information in the heuristic. This uses the shape inference mechanism.

I had to also do some small tweaks:
- use Sum() operator as a way to match shapes of blobs that had otherwise unknown shapes. This is related to the Sum() operator that is added to combine multiple incoming gradient inputs (with _autosplit gradients).
- a couple of random shape inference fixes

This reduces the Resnet-50 memory usage on 64 batch from 9.45 Gig to 8.5 Gig.
For a 32 batch, the memory usage is 4330 MiB, down from 4800 MB, compared to Torch's 6856MiB (thanks prigoyal  for checking this for me).

This is unfortunately quite a bunch to review...

Reviewed By: asaadaldien

Differential Revision: D4393909

fbshipit-source-id: 9c7c94125f96512bea80463ebcb63c215ef95ff9
2017-04-25 14:23:25 -07:00
fc77ae1736 remote some experimental files from open-source repo
Differential Revision: D4948835

fbshipit-source-id: 1115914a19d70ae214557132f24e4c302470f47e
2017-04-25 13:31:50 -07:00
aaafcfc529 Improving usability of schema
Summary:
This diff contains the following changes:

- implementing __repr__ on Field types; this makes it a little easier to see what broken in the unit tests
- preserve the shape of ndarray input to schema; previously, empty and scalar arrays lose their shape, while other keeps the shape.
- type-checking ndarray input; this ensures basic integrety of schema

Reviewed By: xianjiec

Differential Revision: D4913030

fbshipit-source-id: bd0f6b8722d95bfe800edf98ba05029c5b99d2af
2017-04-25 10:32:08 -07:00
afd01164f8 Install missing headers.
Summary:
This PR installs missing include headers.
Closes https://github.com/facebookincubator/gloo/pull/30

Differential Revision: D4946478

Pulled By: pietern

fbshipit-source-id: da2d532afc43cf9e5e7fc764dc7821e2dfca6b37
2017-04-25 09:42:21 -07:00
a123247240 Move SIGPIPE initializer to test main
Summary:
It should be up to the program including Gloo to ignore SIGPIPE.
We have seen a case where the EPIPE errno is not properly handled in
an unrelated piece of code. Having SIGPIPE fire means we can get a
core and debug this further.

Reviewed By: andrewwdye

Differential Revision: D4896727

fbshipit-source-id: f6fe2d3f8dc68a9e6c2c457639b45f8aee2d7b20
2017-04-25 09:08:27 -07:00
41705ce7d5 Add zero padding module (#1326) 2017-04-25 16:58:51 +02:00
88fc1d39ff Generic TopK implementation (#744)
* move TopK to generic

* partial genericization of kernel code

* introduce TopKTypeConfig, specialize radix type and conversion for floats

* implement topk for byte tensor

* implement for char tensor

* implement for int tensor, extend test to check indices as well

* works for longs too

* make bitfield set/get a struct, add support for 64-bit types

* extend to double tensor

* implement for half tensor

* asserts; test fix
2017-04-25 16:39:20 +02:00
b3b66e3d00 MKL related files with review comments incorporated
Summary:
This PR is based on commit "977c6b3" as this version allows MKL to use all the cores available.
All MKL related files are added here after incorporating review comments, major changes include

1. usage of Clang-format(Linter) with --style = Google
2. usage of macros for checking input and filter dimension in the mkl operators
3. merged Max and Average pooling functions
4. created a new folder for mkl related python scripts in Python folder and moved them there
5. there is no mkl_alexnet_test.py as that was redundant while convnet_benchmark.py does the same thing
Closes https://github.com/caffe2/caffe2/pull/270

Differential Revision: D4905219

Pulled By: Yangqing

fbshipit-source-id: e5f5b189714a835b93b9ebda24c52e09572dfca7
2017-04-25 00:31:29 -07:00
7153594d7b Fix corruption of NameScope when exception is thrown
Summary:
If exception is getting thrown inside of the namescope it won't be reset to
it's previous value. This diff is changing this behavior to expected one.

Reviewed By: kittipatv

Differential Revision: D4928621

fbshipit-source-id: 1d3579f2093ca60901b0d37ae3f2108deb2333ea
2017-04-24 22:46:27 -07:00
2533671a97 Support 3D&1D SpatialBatchNorm in cuDNN
Differential Revision: D4941087

fbshipit-source-id: 4adbf1f8990c7356f8effd8b0e1ae286fce6558c
2017-04-24 22:16:19 -07:00
2a098fc20e string -> std::string in common_rtc.h
Summary:
In its current form, common_rtc.h can only be included in a file where
```
using namespace std;
```
comes before the include
Closes https://github.com/caffe2/caffe2/pull/398

Differential Revision: D4943125

Pulled By: Yangqing

fbshipit-source-id: 3ef15c9353e6dd7326fc5f60322049c9f594ee6c
2017-04-24 22:06:31 -07:00
795a8a603b guard against apple platforms
Summary:
Mac does not support thread_local, and Caffe supports mac, so we will have to
temporarily disable this on mac.

(Note: this ignores all push blocking failures!)

Reviewed By: marksantaniello

Differential Revision: D4945019

fbshipit-source-id: 6d1d828a96459a85e1ae4fb5394eabdd9e610723
2017-04-24 21:19:30 -07:00
d16e8ec8f3 fix thread_local bug
Summary:
TSIA
Closes https://github.com/caffe2/caffe2/pull/405

Differential Revision: D4944669

Pulled By: Yangqing

fbshipit-source-id: dd38d2fb06b1d7b36bbb5ffb10070d1932070e21
2017-04-24 20:03:11 -07:00
5521fa35a5 use CUB to optimize ElementwiseLinearGradientKernel
Summary: Use a proper reduction in the gradient kernel. This gives about 25% speedup with the n, D I tried (see P57333872), but with larger N, the improvement can be much more sizeable.

Reviewed By: stephenyan1231

Differential Revision: D4941218

fbshipit-source-id: 627eaf26fc20a81f1ef449f39eda0d2191b8c746
2017-04-24 19:31:56 -07:00
4c08d6ae3b Allow cpu-only grad update in Parallelize_GPU.
Summary: Instead of requiring gradient updates on GPU, this change will allow the usage when loss computation happens on GPU while all grad updates happen on CPU.

Reviewed By: jhcross

Differential Revision: D4943996

fbshipit-source-id: 1f2144c4277dfdb865877e0d0216ca1ac7dd7309
2017-04-24 18:47:36 -07:00
081001a176 "IsMemberOf" operator
Summary:
Add a pointwise `IsMemberOf` operator to Caffe2.

The original idea was `In` but I think this is not so clear.

I used `UnaryElementwiseWithArgsOp` at some point, but it was making the code a bit more difficult to read without bringing any feature.

Reviewed By: ender-wieczorek

Differential Revision: D4912655

fbshipit-source-id: 716b66bb51468dd59db5f76f23d78cda85961b58
2017-04-24 18:18:49 -07:00
24ff90ee6b "Where" operator
Summary: Adding a pointwise `Where(condition, left, right)` operator to Caffe2.

Reviewed By: ender-wieczorek

Differential Revision: D4901402

fbshipit-source-id: a33682e77b2e7367050a94eeb4e10b7e5de9f955
2017-04-24 18:18:48 -07:00
437a670ce8 Enable building Gloo only on 64-bit systems
Summary:
Cannot guarantee Gloo will build on 32-bit systems as we don't run continuous build/test for this.

Verified this works by changing 8 to 7 and observing USE_GLOO defaulting to OFF.
Closes https://github.com/caffe2/caffe2/pull/401

Differential Revision: D4943135

Pulled By: pietern

fbshipit-source-id: 1972658afe819951e24ffbec76eb615c36ab0cc2
2017-04-24 17:40:31 -07:00
2994dd6377 Fix python support problems caused by building script errors.
Summary:
When trying to build caffe2 with python provided by homebrew, I find out there are some errors in the building scripts. The "get_python_cmake_flags.py" script is supposed to find out the correct python library and header file locations. However, due to these errors, this script does not function correctly. After building, caffe2 is linked against the default python library provided by Apple which causes a crash when trying to validate whether or not the installation is successful:
```shell
python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
```
The fix is as simple as follows:

- Add "shell" so that command substitution could work under Makefile.

- Add blank spaces between -D options so that they are treated as options not makefile targets.

- Print the "flags" variable without the newline character so that they could be utilized by command substitution correctly.
Closes https://github.com/caffe2/caffe2/pull/391

Differential Revision: D4943212

Pulled By: Yangqing

fbshipit-source-id: 04d3595fa2d89fe57aed5b6a7a91a95114a82a1b
2017-04-24 17:17:21 -07:00
902409be56 caffe2: datasets pack/unpack
Summary:
Two new operators to pack and unpack a dataset. This is so that we can
re-use other operators that do not understand the schema format. The immediate
use-case is to use it with a partition operator.

Packing works by splitting the input into separate tensors, putting them in a
vector and wrapping in a shared_ptr (as opposed to a unique_ptr, so we can
copy).

Unpack takes the packed input and concatenates it back to the original.

I also had a gard time understanding the iteration, so I created a TreeWalker
that just hides the complexity of operating with all the arrays and makes the
short functions for a given purpose that at least for me are easier to
understand.

Reviewed By: dzhulgakov

Differential Revision: D4918002

fbshipit-source-id: ecbf9196ed25e886a94383961176b8c84dde2d2f
2017-04-24 16:09:39 -07:00
9cb901caf0 Forward-only rnns
Summary:
Added option to recurrent_net and RNNCell's for forward_only. If this is set, the backward_step_net is not passed to the operator.
When backward_step_net is not available, operator knows it is in forward_only mode and does not create workspaces for each step but cycles
through only one private workspace.

Note: we could avoid doing a lot of work in recurrent.py:recurrent_network call when backward step is not needed, but doing that nicely requires
more refactoring that I did not want to do now. Thus, we create the backward step nets etc, but just don't pass it to the op.

This can be used to create more efficient inference models. You can also sanitize existing inference nets and remove the backward_step_net argument to
get the benefits.

Reviewed By: salexspb

Differential Revision: D4916482

fbshipit-source-id: c99b93c9cb897c32b0f449253f7f6d6a942618ad
2017-04-24 15:52:27 -07:00
7440cd5ef4 Add python_func_type to PythonOp
Summary:
This is needed to have a stateful PythonOp (such as the PyTorch in the following diff) where computing f will produce a state (not tensors) thats consumed by grad_f.
python_func_type is a type that constructed as python_func_type(f) and provides forward, backward methods (will be delegated to f, &f_grad). We are constructing this object in at Op registration time to have it as thread local.

Differential Revision: D4900963

fbshipit-source-id: 00a6a55fa372e2244048921914e22e710d11f7ce
2017-04-24 15:52:26 -07:00
eb1130803f caffe2: smart_tensor_printer
Summary:
As per request moving elsewhere and using the Dispatcher. The reason
why I didn't put it into tensor.h is because the dispatcher lives in operator.h
and operator.h includes tensor.h. I also didn't want to do any codemods. If
this turns out to be useful it can be changed. Also the name is not super great
but the TensorPrinter is already taken so that's what first came to mind.

Reviewed By: dzhulgakov

Differential Revision: D4893325

fbshipit-source-id: 7d4e56c4e57164c3cd3748f4a705a4ffe6b932d9
2017-04-24 15:52:26 -07:00
0bb558716a rename model_helpers to brew and lowercase all helper functions
Summary:
rename model_helpers to brew. This is a big diff now. I did these things:

1. replace model_helpers with brew:

  find . -type f -exec sed -i 's/model_helpers/brew/g' {} +

2. rename model_helpers.py and model_helpers_test.py
3. rename ModelHelpersTest to BrewTest
4. lowercase all the helper functions to distinguish them from single op
5. run my unittests
6. run converge tests

Reviewed By: salexspb

Differential Revision: D4930465

fbshipit-source-id: f420a1b03238df1cbe9f4426e0b9c43a12119661
2017-04-24 15:52:26 -07:00
bef6e45f8b rename ModelHelperBase
Summary:
rename ModelHelperBase to Model.

This is the result of running:

  find . -type f -exec sed -i 's/ModelHelperBase/ModelHelper/g' {} +

We had 19 results when fbgs ModelHelperBase. Here is 20 instances because I added 1 test in model_helpers_test.py

Reviewed By: salexspb

Differential Revision: D4928337

fbshipit-source-id: bc4c12b60b90c167e717de50ea9fe17521e142e3
2017-04-24 15:52:26 -07:00
f407078d38 ReduceFrontSumOp: striped Axpby
Summary: Instead of calling math::Axpby in a loop, we can do it in one kernel much more efficiently.

Reviewed By: asaadaldien, jamesr66a

Differential Revision: D4935893

fbshipit-source-id: 33497784604d1779723d578ea5400e87803851f0
2017-04-24 15:52:26 -07:00
2e74129f0e ReduceDimsGradientOp: replace multiple Scale calls with a batched/striped one
Summary: jamesr66a noticed that the ScaleKernelAlphaDevice kernel was showing up in a profiler a lot. This was because it is called in a loop in ReduceFrontSumGradientOp. This was easy to replace by one kernel that scales in a "striped" manner.

Reviewed By: asaadaldien, jamesr66a

Differential Revision: D4935888

fbshipit-source-id: bc7bfd8c94988074ace6fbf3fdfb85905027f272
2017-04-24 15:52:26 -07:00
bf20e4e9b0 Remove MiLSTM from recurrent.py left over after refactoring
Summary: its not used

Reviewed By: urikz

Differential Revision: D4936008

fbshipit-source-id: cc044bbdac0d17503ce9376b98e4bf79a4dc959c
2017-04-24 15:52:26 -07:00
4f77a49ddd refactor LSTM test to avoid copy pasta, improve speed 1.5x and provide better coverage
Summary:
This is getting too messy again. So cleaning up it even more. One thing I added here - not calling random to generate the input sequence. Ideally we do this for all other inputs. This was reported to be an issue when hypothesis finds bad examples - it can make it run very long.

Also I tunned ranges a bit so test finishes faster. On my devgpu test the whole test took 600 before and now is 39 seconds.

One more important thing - we want to test all combinations of things that are in the for loop. While things provided by hypothesis are just random tensor inputs.

Differential Revision: D4902956

fbshipit-source-id: ceb02d6761406b3192101d3b255abe90b2866770
2017-04-24 15:52:26 -07:00
684607a793 Add a friendly error message for unzipped mnist file.
Summary: Closes https://github.com/caffe2/caffe2/pull/370

Differential Revision: D4933662

Pulled By: Yangqing

fbshipit-source-id: a5a8a07ccd49325d2ab493abf695abd99e49bd35
2017-04-24 15:52:25 -07:00
41f4198344 CUDA version of PRelu/Gradient + Fix Gradient for dW
Summary:
CUDA version of PRelu and its gradient. Forward pass is straightforward, backward pass requires reductino over the weights.

tsaizhenling, please patch this and test.

Differential Revision: D4931630

fbshipit-source-id: 1238e7d536e41480713865ced91aaef88f4feef5
2017-04-24 15:52:25 -07:00
3b0069a014 Expose operators execution statistics to python frontend.
Summary: To expose operators execution statistics in python, profiling measurements collected in ProfDAGNet class is leveraged. In current implementation, a new operator is defined that outputs the statistic data in a protobuf message. In the frontend, OperatorStatsContainer works as a wrapper to print ProfDAGNet statistics.

Differential Revision: D4923009

fbshipit-source-id: 18a6d76a405ef277a3fca7a312609051cf943207
2017-04-24 15:52:25 -07:00
09bb91022a Fix tests for ops without a CUDA backend
Summary:
*See https://github.com/caffe2/caffe2/pull/227*

* Logit
* ReplaceNaN
* BatchOneHot
Closes https://github.com/caffe2/caffe2/pull/277

Differential Revision: D4915268

Pulled By: Yangqing

fbshipit-source-id: 77ccb2e7d03e6953e8ca60646987a02868d0ef5b
2017-04-24 15:52:25 -07:00
8387bc4680 Added Python_ADDITIONAL_VERSIONS to cmake so python 2 is default.
Summary:
When installing on systems such as Arch Linux where the default python version is 3 the build will fail. To fix this instead of changing the python link in the shell it is more efficient to set the default python version allowed by cmake.
Closes https://github.com/caffe2/caffe2/pull/361

Differential Revision: D4932214

Pulled By: Yangqing

fbshipit-source-id: 06997d2df68b8e4037d72fd49813f6f74ca7591b
2017-04-24 15:52:25 -07:00
b82f9e9ea7 FindOp
Summary:
Simple FindOp for CPU and GPU which searches a list of unordered needles from an unordered index. CPU version might be faster if first sorting the index / needles, but we can get back to that later.

CUDA op is also kind of brutish, but pretty parallel. Since the index and the queries are smallish at least in the use case currently in mind (Machine Translation's team word candidate search), I think this is a sufficient start.

Note that this is much simpler than the Index-class of ops which allow modifying the index etc. Since CUDA ops are more complex to implement for the full Index functionality, I decided to make a separate op with this very simple functionality.

Differential Revision: D4910131

fbshipit-source-id: 6df35c9e3c71d5392a500d5b98fd708ab0c8e587
2017-04-24 15:52:25 -07:00
f07ec699ee Add rendezvous timeout parameter and defaults to StoreHandler::wait()
Summary: Add default rendezvous timeout for RedisStoreHandler and FileStoreHandler.

Reviewed By: pietern

Differential Revision: D4911678

fbshipit-source-id: e69dd03d96214449944d583b20941540cc0b6643
2017-04-24 15:52:25 -07:00
fa261cdafb arg_scope for model_helper
Summary:
arg_scope module for model_helpers.

Some coding example with it:

  with model_helpers.arg_scope([model_helpers.FC], kwargs):
              model_helpers.FC(model, "x", "out_1", n, n)

  with model_helpers.arg_scope([myhelper], n=-3):
              with model_helpers.arg_scope([myhelper], n=-2):
                  with model_helpers.arg_scope([myhelper], n=n):
                      res = model_helpers.myhelper(None)

  with model_helpers.arg_scope([myhelper], n=-3), \
          model_helpers.arg_scope([myhelper], n=-2), \
          model_helpers.arg_scope([myhelper], n=n):
                  res = model_helpers.myhelper(None)

Reviewed By: salexspb

Differential Revision: D4837180

fbshipit-source-id: 2cbd81681779d6cd1e61ee189edcc1cf3bb07d15
2017-04-24 15:52:25 -07:00
199a09c7dd XCode -> Xcode
Summary: Insufferable Apple fanboys have burned this into my brain.

Reviewed By: Yangqing

Differential Revision: D4913772

fbshipit-source-id: 486c20e9c921
2017-04-24 15:52:24 -07:00
a48062b1a2 temporarily fix sync script bugs changes by reverting partially https://github.com/caffe2/caffe2/pull/266/files 2017-04-24 15:49:22 -07:00
9899512401 Remove common.h from root
Summary: This file was left over after a recent refactoring but is not used.

Reviewed By: andrewwdye

Differential Revision: D4940265

fbshipit-source-id: 01f8c5fbc73dd0ca0a92306dbfef22ff28133750
2017-04-24 13:51:15 -07:00
d95feb3feb Only build on 64-bit systems
Summary:
While it is theoretically possible to make Gloo work on 32-bit systems, it's unlikely anybody would ever use it on 32-bit systems. This removes the expectation that it should work...

Fixes #28
Closes https://github.com/facebookincubator/gloo/pull/31

Differential Revision: D4939073

Pulled By: pietern

fbshipit-source-id: 8c60804f7ae5cf835332871a424aefa2c498e8a4
2017-04-24 10:38:45 -07:00
3ab074b3c5 Fix torch.stack() with Variable inputs (#1345) 2017-04-24 12:20:51 -04:00
6a69f7007b Revert "add keyword out for autograd function Concat to match torch.cat (#1336)" (#1340)
This reverts commit 71b9dea6ecc2278511ba6c2531437d27d9a2b8c8.
2017-04-23 19:19:27 +02:00
71b9dea6ec add keyword out for autograd function Concat to match torch.cat (#1336) 2017-04-23 15:36:24 +02:00
fa4f363b93 Instance norm (#1283)
* instance norm

* fix whitespaces

* whitespaces

* docs

* "C" letter was cyrillic in docs, fixed

* remove force_eval, fix non contiguous case
2017-04-23 14:49:15 +02:00
aab30d4ea2 Fix errors when no CUDA devices are available (#1334)
Fixes #1267

This fixes a number of issues when PyTorch was compiled with CUDA
support but run on a machine without any GPUs. Now, we treat all errors
from cudaGetDeviceCount() as if the machine has no devices.
2017-04-23 14:45:27 +02:00
2b56711c24 Indexing fix for fused GRU/LSTM kernels when all tensors are not contiguous. (#1325) 2017-04-22 04:22:32 -04:00
2fa3365f94 Merge commit '5224fc56b03b6468cb85ccf39034b8ab0d76d04e' 2017-04-22 01:14:34 -07:00
5224fc56b0 fix typo 2017-04-22 10:14:09 +02:00
4373580e6b Merge commit 'e80a3a7f7b8d0e179c1481e0744f08e9385b31f3' 2017-04-22 01:11:10 -07:00
d9406a8a1a Merge commit '10387a3f35573462e18219c321ff550757ce9b09' 2017-04-22 01:10:53 -07:00
e80a3a7f7b Indexing fix for fused GRU/LSTM kernels when all tensors are not contiguous. 2017-04-22 01:09:46 -07:00
5b83fe6781 add contiguous checks 2017-04-22 09:57:36 +02:00
24d92b5d9f Concatenate directly into shared memory when constructing batches (#1323)
This saves an extra memory copy, which speeds up data loading a bit
(5-10% with accimage).

As part of this change:

 * torch.cat accepts keyword argument out
 * sepcifiying out=None is treated like not specifying out
2017-04-22 03:40:30 -04:00
1375694853 Document torchvision members 2017-04-21 12:50:36 -07:00
be5e399d46 Add a simple README for torch/lib. (#1322)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-21 15:06:12 -04:00
884690adb3 build_ios.sh comments fixes
Summary:
Changed _Android_ to _iOS_ in the comments in scripts/build_ios.sh.
Closes https://github.com/caffe2/caffe2/pull/364

Differential Revision: D4930101

Pulled By: Yangqing

fbshipit-source-id: 8f0a6aa1b43fd57c2f71f1c667c61d1f69b1e061
2017-04-21 10:52:29 -07:00
57b51db8d7 Add a guard function to check Caffe2 linking setup.
Summary:
This helps diagnosing issues like #346
Closes https://github.com/caffe2/caffe2/pull/354

Differential Revision: D4928347

Pulled By: Yangqing

fbshipit-source-id: b45685f1da18cbc49be293260b1fc2268fe5cd4c
2017-04-21 03:38:37 -07:00
4dafb608e7 Fix char_rnn LSTM import
Summary:
Fix for char_rnn.py with latest LSTM changes in rFBS779c69758cee8caca6f36bc507e3ea0566f7652a.
Fixed some linting issues.

Reviewed By: salexspb

Differential Revision: D4927018

fbshipit-source-id: cda760a170056b8bc237b4c565cc34800992c8e0
2017-04-20 22:46:19 -07:00
01c76bf830 Optimize TransposeOp by using strided access pattern, bulk memory transfer, and other profile-guided optimizations
Summary: Work in progress for improving the performance of the TransposeOp on CPU. This is used extensively for inference in several neural MT systems, so optimizing this function is worthwhile and will reduce request latency.

Differential Revision: D4913075

fbshipit-source-id: fa2742829291d91f3eba00fdfe7d6c0dae83e206
2017-04-20 18:31:40 -07:00
9f86de2dc7 Support WatchOS build
Summary:
To build, run
`IOS_PLATFORM=WATCHOS scripts/build_ios.sh`
Closes https://github.com/caffe2/caffe2/pull/321

Reviewed By: Yangqing

Differential Revision: D4923400

Pulled By: salexspb

fbshipit-source-id: 3a87f068562a01e972ea915c9be32f0667e8ea19
2017-04-20 18:15:47 -07:00
10387a3f35 fix gradBias checks 2017-04-20 19:21:50 -04:00
627921d01d Use CUDA standard tanh for lstm
Summary: Better to use standard library tanh(), because there can be numerical differences to other systems.

Reviewed By: urikz

Differential Revision: D4910421

fbshipit-source-id: 3a1e63cd20a6b8e3720a1deafea227652b38205e
2017-04-20 16:19:57 -07:00
a782a6231f Merge commit 'e788ea40de0f7ef393f1b602098a6775a95d8976' 2017-04-20 19:00:45 -04:00
e788ea40de fix typo in TH_APPLY for _dimOffset 2017-04-20 18:59:12 -04:00
6089900011 grammar/typo: "There's 3" -> "There are three"
Summary: Closes https://github.com/facebookincubator/gloo/pull/27

Differential Revision: D4919746

Pulled By: pietern

fbshipit-source-id: 35733b75fc169d2ccff8b10df013eed8c279dfd5
2017-04-20 15:19:56 -07:00
81345306c8 Merge commit '8236d38e81396ac48697ac289c0476cff18a8e08' 2017-04-20 15:03:48 -07:00
6ed36c37e6 fix CUDNN layer weight size calculation for multiple layers
Summary: CuDNN LSTM weights were incorrectly sized for layers > 0: there was assumption that the input size to middle layers is same as for the first layer, but actually the middle layer will get input from a layer below, which will have dimension equal to the output dimension (hidden dimension). This worked fine when input_dim and hidden_dim were equal, as are the default params for lstm_benchmark.

Reviewed By: salexspb

Differential Revision: D4922824

fbshipit-source-id: 3ed05529dcb0a4e66ad440084a55df1c5932fd33
2017-04-20 15:02:48 -07:00
f0a19e2617 Merge commit '331219c5506b26bf0906b7acdafb4823e07a924e' 2017-04-20 15:01:22 -07:00
8236d38e81 add cusparse link dependency 2017-04-20 14:31:30 -07:00
8adf8fe2ed create and expose handles for cusparse 2017-04-20 14:30:14 -07:00
4f2531fbaa syncing fbandroid/objc to fbcode
Reviewed By: ajtulloch

fbshipit-source-id: 5be2fd2636bc6106be1f7be49cddd439dfe8d28a
2017-04-20 12:53:24 -07:00
d2472d1ab5 Disable cudnn dilated convolutions for kepler. (#1308) 2017-04-20 15:31:45 -04:00
f768233a1c Fix a data_workers test
Summary:
This is a global variable which can be incremented by other tests.

Before:
```
$ pytest -v caffe2/python/data_workers_test.py
...
caffe2/python/data_workers_test.py::DataWorkersTest::testGracefulShutdown PASSED
caffe2/python/data_workers_test.py::DataWorkersTest::testNonParallelModel FAILED

============================================= FAILURES ==============================================
_______________________________ DataWorkersTest.testNonParallelModel ________________________________

self = <data_workers_test.DataWorkersTest testMethod=testNonParallelModel>

    def testNonParallelModel(self):
        model = cnn.CNNModelHelper(name="test")
        coordinator = data_workers.init_data_input_workers(
            model,
            ["data", "label"],
            dummy_fetcher,
            32,
            2,
        )
>       self.assertEqual(coordinator._fetcher_id_seq, 2)
E       AssertionError: 4 != 2

caffe2/python/data_workers_test.py:38: AssertionError
-----------------
Closes https://github.com/caffe2/caffe2/pull/211

Differential Revision: D4916591

Pulled By: Yangqing

fbshipit-source-id: 281f12d7f02dbd0ce0932024cf1f16cd12130112
2017-04-20 11:38:11 -07:00
41b7217898 Fix url to original Caffe external resource in README. (#317) 2017-04-20 11:31:50 -07:00
95f123a83e fix download progress bar's percentage exceed 100%
Summary:
downloaded_size need to be added with the length of returned data_chunk.
When the last block's size less than chunk, the percentage should exceed 100%
Closes https://github.com/caffe2/caffe2/pull/329

Differential Revision: D4922227

Pulled By: Yangqing

fbshipit-source-id: 7d05d9bbf2dad0a9d330be96b60e658908185a46
2017-04-20 10:41:06 -07:00
51033f19d7 unbreak test_seq2seq_caffe2_model_cnn_one_stack_encoder
Summary: Fixes unit test test_seq2seq_caffe2_model_cnn_one_stack_encoder, broken by D4905003. (Also some commas.)

Differential Revision: D4920699

fbshipit-source-id: 2fe501095e3e26a475d666afcae8e48c953f2eef
2017-04-20 10:06:25 -07:00
331219c550 define abs for short too 2017-04-20 09:55:17 -07:00
a790256537 Add option to control the size of lengths tensor
Summary: This would allow us to pin the size of lengths tensor to the batch size. I'll use this in a follow up diff.

Reviewed By: kennyhorror

Differential Revision: D4906634

fbshipit-source-id: 8d3d151f33fd99547d9940e7c663779810283eb6
2017-04-20 09:53:22 -07:00
249dc01f0d Set cuDNN pooling mode to match CPU&CUDA implementations
Summary: Set pooing mode to execlude padding values and match CPU&CUDA implementations.

Differential Revision: D4920476

fbshipit-source-id: 26ce656cc792061f706e2acb37e72cec46ac77c8
2017-04-20 09:22:00 -07:00
5a856ce03e disable dropout completely when not used
Summary: salexspb recognized that my diff of fixing num_layers>1 cudnn lstm made it run much slower. Turns out this was caused by adding the dropout states to the gradient op (which it was missing ,that was a bug). But since we use dropout=1.0, we don't need to initialize the dropout states, and turns out this improves the perf of CuDNN LSTM very significantly, at least when hidden_dim is small (2.5x increase with hidden_dim=40). With large hidden_dim, the improvement is more modest.

Reviewed By: salexspb

Differential Revision: D4920543

fbshipit-source-id: 860c9d4c61793252f658dc5e3390bab571476be5
2017-04-20 08:40:25 -07:00
5b6fb047aa Fix parallel build support in makefile
Summary:
Top-level makefile had `make` hardcoded, resulting in slow build and the following message when following installation instructions:

    warning: jobserver unavailable: using -j1. Add `+' to parent make rule.

Replacing this recursive make command with the variable MAKE fixes the issue.
Closes https://github.com/caffe2/caffe2/pull/324

Differential Revision: D4920978

Pulled By: Yangqing

fbshipit-source-id: 1e75ab41786e52d1b7abcc2c46ad1088880d8c1d
2017-04-20 01:35:03 -07:00
e34c5dc1c3 macOS build issue with set_affinity() in net_gpu.cc
Summary:
see: https://github.com/caffe2/caffe2/issues/306
Closes https://github.com/caffe2/caffe2/pull/308

Differential Revision: D4915436

Pulled By: Yangqing

fbshipit-source-id: d9186792e31d137ba506d83c3b8bb04dc78b956f
2017-04-20 01:16:57 -07:00
fd9185ab21 fix getting empty struct
Summary: `not field` calls `__len__()`, causing the field to appear to be missing even when it's not

Differential Revision: D4910587

fbshipit-source-id: bc2b2fadab96571ae43c4af97b30e50c084437af
2017-04-19 22:36:05 -07:00
47ce345699 Limit the maximum memory for keep_on_shrink for predictor
Summary: We had to disable keep_on_shrink flags for inference and some training workloads, this change limits the memory allowed to be kept around when we are allocating smaller blob after a bigger one.

Differential Revision: D4889366

fbshipit-source-id: 87412cc1c0bf2c43ea1f3f19e31afc178bc1b9db
2017-04-19 22:16:26 -07:00
8f43e3fe36 update ios-cmake 2017-04-19 22:10:31 -07:00
e5e3ec1498 fix unit test
Summary: CUDA is not implemented

Reviewed By: xianjiec

Differential Revision: D4917368

fbshipit-source-id: dc41a76cf569018896cf457c0e3358ce840e198e
2017-04-19 17:22:00 -07:00
7805ac9098 Base Store::wait() should ignore timeout for back compat
Summary: PrefixStore::wait() uses a default timeout if unspecified. This is incompatible when using PrefixStore to wrap a Store implementation that does not support timeout. Instead the base Store::wait(keys, timeout) implementation is called, throwing an exception. This change modifies the base implementation to ignore the timeout.

Differential Revision: D4916517

fbshipit-source-id: 3cdd83bd209bf938b58442d82f3fc245e68019ad
2017-04-19 16:49:44 -07:00
4bc40d0658 reset environment after every example
Summary: Hypothesis tests only call `setUp()` once per test. It's annoying to reset manually.

Reviewed By: xianjiec

Differential Revision: D4911862

fbshipit-source-id: 6b1c11daf002d51c8a0d532261506bcb20429438
2017-04-19 16:46:16 -07:00
001598a59b add net gradient check
Summary:
1. add net gradient check to dper2 model unittest framework
2. add net gradient check to mtml model
3. refactor the code setting defaults to namedtuple.

Reviewed By: kittipatv

Differential Revision: D4897169

fbshipit-source-id: 4f17dd06ee169aa1158f12f5156614d45d7d97c1
2017-04-19 15:19:55 -07:00
4ad3a4fc8b Revert D4794432: Added tiles and axis as input parameters to Tile Operator
Summary: This reverts commit a7e38f4f925a4cedf530924bd426c3bb08b5aad8

Differential Revision: D4794432

fbshipit-source-id: 05b2b0d101ebd917527e94ef8a74e63ab40942a4
2017-04-19 14:17:25 -07:00
5f65ee9ca0 Add more newContiguous calls and checks 2017-04-19 14:01:31 -07:00
f750a2d2df fix a few typos
Summary:
fix typo: Dimention, probablity
Closes https://github.com/caffe2/caffe2/pull/310

Differential Revision: D4915798

Pulled By: Yangqing

fbshipit-source-id: 3a16d3adc469c9930ce0dad8584c4678b3c3b5c0
2017-04-19 13:31:33 -07:00
883ff96f74 Allow UniformIntFill to produce empty tensor
Summary: This is needed for the completeness of random negative sampling. When the pool size is 0, we want to generate empty indices tensor.

Reviewed By: xianjiec

Differential Revision: D4906866

fbshipit-source-id: 75d66a92d15d60bb37bcd1075d324f28069c4fa0
2017-04-19 13:03:23 -07:00
b294aadc66 fp16 support for FullyConnected op(Fixed)
Summary: This diff resloved some issues in reverted PR246.

Differential Revision: D4911821

fbshipit-source-id: 0a6fa47f4c2405475697e40fb926758c534f8ef7
2017-04-19 12:49:12 -07:00
f9149b1f2e Fix halving-doubling corner cases
Summary: Fixes for corner cases with small element counts. Fixed problems include (1) calling range on out of bounds pointers, (2) failing to allocate send or receive buffers in cases where they correspond to out of bounds indices for reduce-scatter, but are needed in the allgather, (3) not allocating enough receive buffer space (more than count_ bytes may be needed in some cases)

Reviewed By: pietern

Differential Revision: D4912656

fbshipit-source-id: 0409d01894ff9c93ef1a1fdf8021c9ecf62f9b57
2017-04-19 12:20:28 -07:00
8b5782ed5c Weighted sampling dequeue operator
Summary:
Similar to SafeDequeueBlobsOp, but add weight-based sampling for reading from multiple input BlobsQueue.

WeightedSampleDequeueBlobsOp will take a vector of weights (each weight is mapped to one input blob queue).
Based on probability, we will choose which BlobQueue to fetch.
WeightedSampleDequeueBlobsOp shall stop when any of input BlobQueue is empty.

Reviewed By: dzhulgakov

Differential Revision: D4905160

fbshipit-source-id: 5b1551e2250569f933a6c01ed04442843c5e0cb6
2017-04-19 12:02:06 -07:00
d47c1362c5 changed doxygen config to new docs path (#311)
* updated ubuntu instructions

* updated ubuntu notes and troubleshooting

* updated tutorials using local files

* added doxygen python blocks for docs generation

* doxygen related files for generating docs

* removing Mac and Windows build status while those are in beta

* inference lookup is local now

* launch updates

* moved to docs folder, updating paths
2017-04-19 11:49:59 -07:00
d58141ec4c launch updates (#309)
* updated ubuntu instructions

* updated ubuntu notes and troubleshooting

* updated tutorials using local files

* added doxygen python blocks for docs generation

* doxygen related files for generating docs

* removing Mac and Windows build status while those are in beta

* inference lookup is local now

* launch updates
2017-04-19 11:40:51 -07:00
6cae3fa896 Typo in Build version of ubuntu (#294) 2017-04-19 11:25:59 -07:00
9ef30b337e Add six to Tegra X1 install script
Summary:
When compiling Caffe2 on a Jetson TX2 using JetPack 3.0, the compilation with the Tegra X1 build script runs through perfectly fine. However, when running

    from caffe2.python import workspace

the following error shows up:

> ImportError: No module named six

After installing `six` manually using

    sudo pip install six

this works fine. I thus added the `six` module to the install script.

I assume this will also be required for the `build_raspbian.sh` script, however as I could test this, I didn't add it (yet).
Closes https://github.com/caffe2/caffe2/pull/293

Differential Revision: D4914121

Pulled By: Yangqing

fbshipit-source-id: 75947e8c295e1f5ad3f480a025fe8518dd91a957
2017-04-19 11:02:23 -07:00
b89688658c Missing CUDA_NVCC_FLAGS & CUDA_HOST_COMPILER flags at GPU arch detection.
Summary:
This tiny patch fix missing ```CUDA_NVCC_FLAGS``` & ```CUDA_HOST_ARCH``` from ```caffe_detect_installed_gpus()```.

-----------------

People may want define their custom flags or compilers that are more CUDA compatible. Automatic gpu arch detection ignores these flags and fail. Example of such custom flags:

```
cmake . \
-DCUDA_ARCH_NAME="Auto" \
-DCUDA_HOST_COMPILER="/usr/bin/gcc5"
```

* Autodetection part fails regardless proper compiler flags are passed, due to system gcc 7.0 that doesnt work with CUDA thus all arch will be enabled:
```
-- The C compiler identification is GNU 7.0.1
-- The CXX compiler identification is GNU 7.0.1
...//\\...
-- CUDA detected: 8.0
...//\\...
-- Automatic GPU detection failed. Building for all known architectures.
-- Added CUDA NVCC flags for: sm_20 sm_21 sm_30 sm_35 sm_50 sm_60 sm_61
```
* Patch fix the autodetection time as expected:
```
$ cmake ../ -DCUDA_NVCC_FLAGS="-Xcompiler=-std=c++03 -I/usr/include/cuda/"
-- The C compiler identification is
Closes https://github.com/caffe2/caffe2/pull/288

Differential Revision: D4914215

Pulled By: Yangqing

fbshipit-source-id: c407a750e03cb163f9d57f9f6403042704046014
2017-04-19 11:02:23 -07:00
ea493c6fda build error in context_gpu_test.cc
Summary:
caffe2/caffe2/core/context_gpu_test.cc:97:31: error: implicit instantiation of undefined template 'std::__1::array<CUstream_st *, 2>' std::array<cudaStream_t, 2> temp = {0};

(fixes build issue on macOS 10.11.6)
Closes https://github.com/caffe2/caffe2/pull/296

Differential Revision: D4914191

Pulled By: Yangqing

fbshipit-source-id: 5a2c218eef0f04e0dbfcaf951dd4749424b8cfaa
2017-04-19 11:02:23 -07:00
94ee2f3662 update gloo to master to address #286 2017-04-19 10:57:39 -07:00
a8e6610e3d Fix argument typo in pad_packed_sequence docstring (#1300) 2017-04-19 13:50:59 -04:00
bef5720b76 Flag to report total memory in GPUs + op and python func to retrieve
Summary:
If command line flag caffe2_gpu_memory_tracking is enabled, CUDAContext will keep track of total memory allocated on each GPU. This requires keeping tracking of the sizes of the pointers, thus it might add some overhead, and is thus optional. The overhead is minimal in practice since we don't do allocations after first iterations, usually, though.

Added an op GetGPUMemoryUsage() to fetch this data programmatically, and python function utils GetGPUMemoryUsageStats() to call this op and package the results. Modified LSTM benchmark to report these stats.

This tracking is only for GPU now. CPU allocations are less organized..

Reviewed By: asaadaldien

Differential Revision: D4877451

fbshipit-source-id: 857798fe499d8c78cc590783052cbb2d4db56ea0
2017-04-19 10:49:11 -07:00
56cc1e219b Fix include in mpi/context.cc
Summary:
memcpy comes from cstring

See https://github.com/caffe2/caffe2/issues/286

Reviewed By: Yangqing

Differential Revision: D4914228

fbshipit-source-id: de60c2a98feb4228546a8f1fe237a090101f50e4
2017-04-19 10:19:55 -07:00
1607042bf4 Add timeout parameter and default to rendezvous Store::wait()
Summary: TSIA. Defaulting to 30s.

Reviewed By: pietern

Differential Revision: D4909202

fbshipit-source-id: 7f86f390077a19e559c90a1aa3aa768e273325d1
2017-04-19 10:11:56 -07:00
41620f86c9 Update IntelComposerXE to 2017.2.274
Summary:
Due to the massive dependencies I did not update the version number - under
the same big version number (2017) the API is compatible so no need to
rebuild all the dependencies.

This will unblock the Caffe2 Intel pull request on MKLDNN.

Differential Revision: D4906463

fbshipit-source-id: 0f74436ac3a05605e35b8b649c3e8b5c1c69b500
2017-04-19 10:07:09 -07:00
8a47857ef1 group_conv fix
Summary: gorup conv bug fix. Calling conv without model

Differential Revision: D4911690

fbshipit-source-id: fc7dd7d1b7056dd2a4a02f97ad037ee29c4d8c24
2017-04-19 10:07:09 -07:00
7d023cda6c Add timeout to RedisStore::wait()
Summary: Add a default 60s timeout to RedisStore::wait() to avoid blocking indefinitely	when peer machines are unavailable.

Reviewed By: pietern

Differential Revision: D4908699

fbshipit-source-id: 39de9066633e8b0c8d1ee198b6bf3f70d3961196
2017-04-19 09:58:05 -07:00
9e8b4ef075 Include THCNumerics.cuh in THCAtomics.cuh. (#752) 2017-04-19 12:08:22 -04:00
a35f507532 Update functional.py (#1298) 2017-04-19 11:07:12 -04:00
6aa22beb86 Fix loss.py docs (#1296) 2017-04-19 11:03:15 -04:00
4c70612320 small change to schema
Summary:
as desc.

small fix in the feature_proc layer for the case when we only have one preproc type

Reviewed By: chocjy

Differential Revision: D4908933

fbshipit-source-id: 1338048fc395f85c3724721a9996ad1ee51f0f20
2017-04-19 01:17:22 -07:00
f950a1b70f create bucket-based calibration - model manipulation
Summary: added a new context to layers.py

Reviewed By: kennyhorror

Differential Revision: D4817124

fbshipit-source-id: 36f08964b86092e81df24c1b9d4b167293a7ffb8
2017-04-18 22:01:23 -07:00
8492c411e8 Caffe2 unit test for unmask
Summary: unit test using hypothesis for unmask operator

Reviewed By: ender-wieczorek

Differential Revision: D4904075

fbshipit-source-id: 874d3756ec703ab2cc82f24f7160b4254bf791f1
2017-04-18 21:06:18 -07:00
b7be2016aa Fix typos in memonger.py
Summary:
Found while browsing the code. Cool stuff in here!
Closes https://github.com/caffe2/caffe2/pull/276

Differential Revision: D4911421

Pulled By: Yangqing

fbshipit-source-id: 3bef10a4001a6b4d4527c054519d69131799a0e2
2017-04-18 20:52:41 -07:00
71bf8fb55b Clean up fd from destructor when in listening state
Summary:
It's possible the pair is in the listening state when it is
destructed. The fd will not have been cleaned up in that case, so we
shouldn't assert that being the case.

Reviewed By: andrewwdye

Differential Revision: D4909964

fbshipit-source-id: 7103d74910e3bcf5de9f4658d8f1f682b6c8a70c
2017-04-18 17:51:49 -07:00
580e192151 Revert D4870606: caffe2: datasets pack/unpack
Summary: This reverts commit dc29428de5c96cc3039af2885d9e4b026d9f482d

Differential Revision: D4870606

fbshipit-source-id: 1d05912b1a9e35e84b0c163c7b018db125ce060f
2017-04-18 16:47:05 -07:00
23230215a9 Add run_train_net_forward_only() to LayersTestCase
Summary: Make it convenient to test a model where we don't care about the backward pass, e.g., when the backward pass won't be run anyway.

Reviewed By: xianjiec

Differential Revision: D4906890

fbshipit-source-id: 9da51a9de4422474ce780e180b1ca95d6bc8c46d
2017-04-18 16:47:05 -07:00
ad6b53e401 allow to specify output dtypes for functional layers
Summary:
Currently, the functional layer infers the output types and shapes by running the operator once.
But in cases where special input data are needed to run the operator, the inferrence may fail.
This diff allows the caller to manually specify the output types and shapes if the auto infererence may fail.

Reviewed By: kennyhorror

Differential Revision: D4864003

fbshipit-source-id: ba242586ea384f76d745b29a450497135717bdcc
2017-04-18 16:34:52 -07:00
c7d83a16f6 Update README.md 2017-04-18 19:05:18 -04:00
934816c01c Change the default algo for cuDNN conv forward to PRECOMP_GEMM (#1290) 2017-04-18 19:01:47 -04:00
5a0510934f Merge commit 'fcf4deac7d215f134ea25cd3def8b564b58b033c' 2017-04-18 15:21:20 -07:00
009bbc9983 Allow UniformFill/UniformIntFill to take parameters from input blobs
Summary: This will be used to generate random indices input to `Gather`

Reviewed By: xianjiec

Differential Revision: D4904591

fbshipit-source-id: 8d858631e3d640be2cec12f1566cbf195e6aad4b
2017-04-18 14:31:03 -07:00
fc19473501 Corrections in legacy modules. (#1286) 2017-04-18 17:13:53 -04:00
34546f022a Expose dilated convolutions.
Fixes #1225.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-18 17:13:02 -04:00
ab77742f6e Add some missing documentation for arguments.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-18 17:13:02 -04:00
34269a6fda caffe2: datasets pack/unpack
Summary:
Two new operators to pack and unpack a dataset. This is so that we can
re-use other operators that do not understand the schema format. The immediate
use-case is to use it with a partition operator.

Packing works by splitting the input into separate tensors, putting them in a
vector and wrapping in a shared_ptr (as opposed to a unique_ptr, so we can
copy).

Unpack takes the packed input and concatenates it back to the original.

I also had a gard time understanding the iteration, so I created a TreeWalker
that just hides the complexity of operating with all the arrays and makes the
short functions for a given purpose that at least for me are easier to
understand.

Reviewed By: dzhulgakov

Differential Revision: D4870606

fbshipit-source-id: dc29428de5c96cc3039af2885d9e4b026d9f482d
2017-04-18 13:31:10 -07:00
701e63107f speed improvements, fix tests 2017-04-18 12:46:54 -07:00
655c22569e CPU hspmm + more efficient reorder 2017-04-18 12:46:54 -07:00
cd3bbc9dfd more operations and optimizations (hspmm, reorder, ...) 2017-04-18 12:46:54 -07:00
1018b238ac make gradients contiguous in adagrad 2017-04-18 12:46:54 -07:00
e27bd4ce7a faster cadd 2017-04-18 12:46:54 -07:00
b2acc33c73 contiguousValues method 2017-04-18 12:46:54 -07:00
40804830b8 mark_contiguous operation 2017-04-18 12:46:54 -07:00
01d84c5f9d revert sparse cuda index type change 2017-04-18 12:46:54 -07:00
88b42324e7 spcadd, sparseMask, cadd, csub, cmul + tests 2017-04-18 12:46:54 -07:00
ec260fe8e9 add test for dsmm 2017-04-18 12:46:54 -07:00
328b416068 THCS contiguous + to_dense 2017-04-18 12:46:54 -07:00
4bde9efbd7 Update CONTRIBUTING.md 2017-04-18 15:39:58 -04:00
ff781ed059 Update CONTRIBUTING.md 2017-04-18 15:39:26 -04:00
8f9a1af253 Merge commit 'fcf4deac7d215f134ea25cd3def8b564b58b033c' 2017-04-18 12:22:44 -07:00
31900b6bae Merge commit '1feb120d938d47c01900f656322f16bc41d08af3' 2017-04-18 12:22:27 -07:00
ebb5cc4cdb Make Gather works on empty DATA tensor
Summary: Gather should work when both DATA and INDICES are empty

Reviewed By: xianjiec

Differential Revision: D4906878

fbshipit-source-id: 23585afbe618656d7f5831c56d360a03e3cb2584
2017-04-18 12:21:08 -07:00
46cf6ff5fb fix batchnorm docs (#1284) 2017-04-18 15:12:38 -04:00
c153b1ca75 fix softmax ops dimension, add explicit rowmax buffer for simplicity
Summary: scale_ tensor was resizd in correctly in SoftmaxOp CUDA version. For some reason this has not triggered more crashes. I was using the rowmax_ in-place with scale_, which was then also incorrectly sized. Usually D>N, so this was not a issue, but perhaps there were cases with attention where this is not the case. Also the problem is order-sensitive, since if we once had an input with large D, the buffer was of correct size.

Reviewed By: jamesr66a

Differential Revision: D4904989

fbshipit-source-id: 244b6d308d1fc08be885c641440cbacad3b0dbce
2017-04-18 11:49:55 -07:00
fcf4deac7d Fused RNN kernel remove explicit instantiation, isn't needed. 2017-04-18 11:07:58 -07:00
1feb120d93 Mark input as optional for gradInput in Tanh and Sigmoid 2017-04-18 10:33:33 -07:00
2ca071d730 Remove double precision math from LogSigmoid too 2017-04-18 10:28:13 -07:00
8a901c510d Update ops for Sigmoid and Tanh 2017-04-18 09:55:11 -07:00
ed60fe0ed6 Gloo benchmarking and script updates
Summary: Add AllgatherRing and CudaBroadcastOneToAll to benchmark. Add host info and algorithm sweep to chronos script.

Reviewed By: pietern

Differential Revision: D4901111

fbshipit-source-id: 1421025d39b914b14e857f21c43eac30c9c9dd2f
2017-04-18 09:06:34 -07:00
6595545843 fix CuDNN RecurrentOp Gradient init
Summary: CuDNN RecurrentNet GradientOp did not pass the DROPOUT information to the initializer, causing incorrect scratch space size to be estimated. We have an assertion encorcing that scratch space is same for forward and backward ops, so this failed an assertion. We currently hard-code dropout to be 1.0, so this has had no effect on correctness in our tests. For some reason with num_layers=1 there wasn't an issue, but with num_layers>=2, the scratch space size was different.

Reviewed By: salexspb

Differential Revision: D4904715

fbshipit-source-id: 780266c5ecf1f7a32387edcb6fc498a13ac782ac
2017-04-18 08:36:18 -07:00
2d28087529 Update mac build to ease the rpath issues
Summary:
TSIA - for rationale, see comments.
Closes https://github.com/caffe2/caffe2/pull/272

Differential Revision: D4905583

Pulled By: Yangqing

fbshipit-source-id: f6cdbc6b51512da03a4aec3f53de720d35c948b6
2017-04-18 01:17:38 -07:00
4bf559eddb RNNCell, LSTMCell, LSTMWithAttentionCell
Summary: This is the nice way to re-use RNN layers for training and for inference.

Reviewed By: salexspb

Differential Revision: D4825894

fbshipit-source-id: 779c69758cee8caca6f36bc507e3ea0566f7652a
2017-04-18 00:47:20 -07:00
e0a904011b Use gradient name for allreduce op name
Summary: This may help tell different allreduce operations apart during debugging/tracing.

Reviewed By: prigoyal

Differential Revision: D4897921

fbshipit-source-id: bbb2ce02a3e1f467ad54f8a3aed6a4e2b26a9fe4
2017-04-17 23:31:27 -07:00
ed1e342860 Reuse common world for allreduce/broadcast
Summary:
The common worlds can be reused without performance impact as long as
there is a guarantee that no two algorithm instances are using it at
any given time. Since we know the ordering and the maximum
parallelism, we can cycle through common worlds, and reuse them
accordingly.

Differential Revision: D4896779

fbshipit-source-id: 164e1727692eab904fa6879a9f91a3e8332a2e30
2017-04-17 23:31:26 -07:00
cf317d1106 create_net: explicitly specify if one wants to overwrite the network.
Summary:
This is from discussion with dzhulgakov : as a step towards revisiting the
core.Net autonaming, we will first guard against accidental overwrites of
existing networks in the workspace.

ajtulloch since we are doing Predictors in mobile, this should be safe right?

azzolini - I assume this would be safe, but would love to get your approval.

akyrola - would this hurt xray?

Reviewed By: dzhulgakov

Differential Revision: D4897725

fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5
2017-04-17 21:46:53 -07:00
9ab077dc9d Revert D4871248: [caffe2][PR] fp16 support for FullyConnected op
Summary: This reverts commit 6a991c2c993dcf0b1e18aa3f2ffbe19e693dbadd

Differential Revision: D4871248

fbshipit-source-id: b6d812d09a00c83e363432e84742c503abfed65b
2017-04-17 21:31:20 -07:00
391fd14115 Serializes a std::unique_ptr<std::mutex> object.
Reviewed By: xianjiec

Differential Revision: D4901097

fbshipit-source-id: 067d6fe3e2b201818eb6967a02b0ac0289fe8327
2017-04-17 19:46:16 -07:00
0a726af42e Coerce input of FunctionalLayer to record
Summary: Having to pack the input to schema doesn't make much sense since the structure is not recognized by operators anyway.

Differential Revision: D4895686

fbshipit-source-id: df78884ed331f7bd0c69db4f86c682c52829ec76
2017-04-17 19:26:06 -07:00
753201f40a Merge pull request #271 from Yangqing/cmake
Update gloo to new master
2017-04-17 18:44:37 -07:00
3a9daeda8c Update gloo to new master 2017-04-17 16:47:48 -07:00
2f07e77218 update NNPACK related submodules 2017-04-17 16:47:09 -07:00
0a4c5756df Logitzy SoftmaxWithLoss
Summary:
MT-team with urikz found out that their convergence discrepancy with another version of the model was caused by numerical stability issues in softmax. These were caused by our implementation not implementing the optimization to avoid doing exp(log(x)) for softmax-crossentropy. This diff fixes that.

This does not require any changes to the current models since the output of SoftmaxWithLoss is still the exponentiated items

I also did a little bit of cleanup on the code, for some reason we were passing tensors to SoftmaxCPU() instead of pointers.

Reviewed By: urikz

Differential Revision: D4901888

fbshipit-source-id: 62e785ecdd87e33742292b191e91b4f43912e4c0
2017-04-17 16:40:20 -07:00
20330fe3f4 Added tiles and axis as input parameters to Tile Operator
Summary:
Added the possibility to add 'tiles' and 'axis' as input
as opposed to arguments for the Tile Operator. If provided, the input
values will override the argument values

Differential Revision: D4794432

fbshipit-source-id: a7e38f4f925a4cedf530924bd426c3bb08b5aad8
2017-04-17 15:31:20 -07:00
c3a4468af6 Add conv helpers and proxy to CNN
Summary:
Add conv helpers, the migration of functions assumes that people should not do

  cnn_model = CNNModelHelper(use_cudnn=True)
  cnn_model.Conv(..., use_cudnn=False, ...)

Reviewed By: salexspb

Differential Revision: D4884974

fbshipit-source-id: 12af6e2a5863eba789232cd4a4771f95d05f9227
2017-04-17 15:03:05 -07:00
2043b3c114 train and algebra helpers
Summary: Adding train and algebra helpers

Reviewed By: salexspb

Differential Revision: D4884951

fbshipit-source-id: 7a18eb986a7356977a6c3d7a62a996ddce0c793e
2017-04-17 15:03:05 -07:00
277b4eca97 array helpers (concat)
Summary: Adding array helpers

Reviewed By: salexspb

Differential Revision: D4884933

fbshipit-source-id: 2ec3dd37b243c8c717e299876eef7650a08d3f2b
2017-04-17 15:03:04 -07:00
ed3f0ac5e9 nonlinearity helpers
Summary: adding nonlinearity helpers

Reviewed By: salexspb

Differential Revision: D4884894

fbshipit-source-id: fe180df23daabb62175d5a6ae7b46ccb5f7d0123
2017-04-17 15:03:04 -07:00
3623c241c4 normalization helpers
Summary: Add normalization helpers

Reviewed By: salexspb

Differential Revision: D4884786

fbshipit-source-id: 529e678bae133e85d981310014c15d551d39d48b
2017-04-17 15:03:04 -07:00
e881c4c590 removing __all__ in fc, dropout, pooling
Summary: removing __all__ in fc, dropout, pooling

Reviewed By: salexspb

Differential Revision: D4884742

fbshipit-source-id: 4c5cedc9205851b0f3aac6832cebd3d48d0c1e74
2017-04-17 15:03:04 -07:00
54d42af413 Fix a workspace test
Summary:
A workspace may add a suffix such as "_1" to the net name if other nets
have been added to the workspace with the same name. This is true even
if the previous nets have been removed or if the workspace has been
reset.
Closes https://github.com/caffe2/caffe2/pull/213

Differential Revision: D4899877

Pulled By: Yangqing

fbshipit-source-id: b89b196df815dceff49a3ec76d7f658cdc4b0a38
2017-04-17 15:03:04 -07:00
25035e8b3b ElementwiseLinearOp
Summary:
Implement a new op ElementwiseLinear.
Given inputs X of size (N x D), a of size D and b of size D,
the op computes Y of size (N X D) where Y_{nd} = X_{nd} * a_d + b_d.
Typically, this op is followed by SigmoidCrossEntropyWithLogits op for multi-label classification problem.

Differential Revision: D4892220

fbshipit-source-id: 77bffc5fbe03d48b3d83ab785f7c24a71c952aec
2017-04-17 14:18:27 -07:00
ac7663b18c layer_model_instantiator: filter layers by tags
Summary: This diff allows to export a model partially, filtering layers by tags.

Reviewed By: kittipatv

Differential Revision: D4885610

fbshipit-source-id: 65394c5c9119d57a4d0703aa67ad8e79e4370e3b
2017-04-17 14:18:27 -07:00
f67ab32d34 Output peer address on network failures
Summary: Output peer address on network failures. This change will help in root causing network failures.

Differential Revision: D4899129

fbshipit-source-id: 60a762c6551a726081d5335ab478da8dd7f6dad7
2017-04-17 13:50:24 -07:00
9150e33765 Add support for creating docsets. (#1276)
Docsets are an offline documentation format introduced by Dash.app and
supported by Zeal and some other open-source clones.
2017-04-17 16:35:02 -04:00
e4478804ce Fix patched_make_field for newer Sphinx versions. (#1275)
Not sure since which version that change is needed, but using v1.5.5 here.
2017-04-17 16:17:58 -04:00
1082db600e fp16 support for FullyConnected op
Summary:
Includes math lib support, removal of double-precision.
Closes https://github.com/caffe2/caffe2/pull/246

Reviewed By: Yangqing

Differential Revision: D4871248

Pulled By: asaadaldien

fbshipit-source-id: 6a991c2c993dcf0b1e18aa3f2ffbe19e693dbadd
2017-04-17 12:07:57 -07:00
a220f2c3aa Fix group-convolution w/o biases on CPU. (#1273)
* Fix group-convolution w/o biases on CPU.

Not having this guard will cause a crash further down in the `cat`
function when it uses the first element in the passed list to create a
new tensor. (And even after that, cat doesn't handle nulls well.)

* Added test for groupconv w/o bias on CPU.
2017-04-17 14:53:28 -04:00
5311fd3d6a Conv no dx
Summary:
Based on a discussion with Yangqing, optionally disables the calculation of dX for a convolution op (i.e. conv1 in Alexnet), where the data gradient is not needed.
Closes https://github.com/caffe2/caffe2/pull/242

Differential Revision: D4844013

Pulled By: bwasti

fbshipit-source-id: 202d2410ed6c66671e83e8e49a1383883c6ab29e
2017-04-17 11:51:44 -07:00
7270471ed6 Returns auxiliary parameters in the optimizers.
Summary:
1. Adds a function to return auxiliary parameters for each optimizer. This function can be used to serialize the optimizers so that they can be recovered.
2. Fixes the bug that the iteration blob is not incremented by one in each iteration. Suppose there are k parameters using the adam learning rate optimizer, the iteration blob is incremented by k based on the original implementation.

Reviewed By: azzolini

Differential Revision: D4872397

fbshipit-source-id: d86711feedda2ba83af5f2a18141b06a6a473733
2017-04-17 10:16:32 -07:00
7568a99fee Fix bugs in tensor-init-function
Summary:
These two init-functions could result in a wrong memory call if not in CPUContext:
```c++
  template <typename T>
  Tensor(const vector<TIndex>& dims, const vector<T>& values, Context* context)
```
```c++
  template <typename T,
            typename = typename std::enable_if<std::is_scalar<T>::value>::type>
  Tensor(const T& value, Context* context)
```
Closes https://github.com/caffe2/caffe2/pull/252

Differential Revision: D4892633

Pulled By: Yangqing

fbshipit-source-id: 5979fc2170881d30f5260361489dffc5d6fdd1cd
2017-04-16 18:07:15 -07:00
22f3825d8f Cmake mobile build improvements
Summary:
(1) integrate gcc compatible nnpack
(2) speed up the ios travis ci.
Closes https://github.com/caffe2/caffe2/pull/268

Differential Revision: D4897576

Pulled By: Yangqing

fbshipit-source-id: 729fa2e4b5be6f1d0b8d55305f047116969ff61f
2017-04-16 16:46:58 -07:00
dd923cf052 Unmask operator in Caffe2
Summary:
A CPU implementation for unmask operator in caffe2.
There's also a small bug in mask operator, fix it as well.

Reviewed By: ender-wieczorek

Differential Revision: D4896351

fbshipit-source-id: 887d1beb66fe93ea2da1c4e165fce2e026907726
2017-04-16 11:23:19 -07:00
dd80310681 inference lookup in now local for tutorial (#267)
* updated ubuntu instructions

* updated ubuntu notes and troubleshooting

* updated tutorials using local files

* added doxygen python blocks for docs generation

* doxygen related files for generating docs

* removing Mac and Windows build status while those are in beta

* inference lookup is local now
2017-04-16 10:06:56 -07:00
15267ac009 fix typo 2017-04-15 13:08:58 -04:00
3c0dc06ac8 Add __builtin_cpu_supports function def in windows
Summary: Closes https://github.com/caffe2/caffe2/pull/253

Differential Revision: D4892628

Pulled By: Yangqing

fbshipit-source-id: 45d49121027454d9259c4a753438d8f0771cf042
2017-04-14 19:46:19 -07:00
ca0c8e5b25 remove import_array() help and use import_array1
Summary:
TSIA. See

https://github.com/numpy/numpy/blob/master/numpy/core/code_generators/generate_numpy_api.py

Reviewed By: jamorton

Differential Revision: D4893002

fbshipit-source-id: 4b6bee1bdf8ae905e4c0952a3e8bbbacd4129a50
2017-04-14 19:46:19 -07:00
b93a7b134a doxygen configs and updated python files to inc. doxygen tags (#266)
* updated ubuntu instructions

* updated ubuntu notes and troubleshooting

* updated tutorials using local files

* added doxygen python blocks for docs generation

* doxygen related files for generating docs
2017-04-14 16:30:33 -07:00
4db7bec686 CUDA version of SigmoidCrossEntropyWithLogits
Summary: CUDA versions of SigmoidCrossEntropyWithLogits/Gradient.

Reviewed By: jay-mahadeokar

Differential Revision: D4891254

fbshipit-source-id: cabad908026e30d9a0721cad092ba948659ab917
2017-04-14 16:07:33 -07:00
fc8bb523e8 Update gloo dependency 2017-04-14 22:25:45 +00:00
0cb60e7d5a Retrieve ethernet interface link speed
Summary: Retrieve ethernet interface link speed

Reviewed By: pietern

Differential Revision: D4880290

fbshipit-source-id: 91f1555d9bb35ff41dc731e082365a9002bb1661
2017-04-14 14:41:01 -07:00
182e2d348e Use halving/doubling allreduce if context is power of two
Summary:
The halving/doubling algorithm is faster than both ring and chunked
ring up to 5M elements, but only works with power of two contexts
right now. So use it unless the context size is not a power of two.

Differential Revision: D4890065

fbshipit-source-id: 09ff82b375cbd3d0626e0255dcf9b9f4873fff54
2017-04-14 14:32:46 -07:00
a207aa4dbc Fix backward compatibility bug for cnn model helper arguments
Summary:
For new trained models passing kernels=2*[kernel] and using old code for
inference that will not work because (kernels) argument isn't supported and
we are not passing kernel.

Reviewed By: salexspb

Differential Revision: D4888795

fbshipit-source-id: 1649b073c4e1da1d59da9cea581b4dcab6dbaf5c
2017-04-14 09:47:48 -07:00
475eff5281 Allow peer access only in groups of 8
Summary:
This is the hardware limit set by NVidia. Basically, on Amazon P2 machines that
have 16 gpus, the previous setting will trigger an error. This fixes the issue
but is pending verification from Amazon.

Differential Revision: D4888402

fbshipit-source-id: 8d26a24d6e0476f895b9afdb979144eb8e6b9321
2017-04-14 09:47:48 -07:00
3c9dfe4736 dag-compatible forward memonger
Summary: Memonger's inference optimization is very efficient, but does not work if a multi-threaded DAG net is used. So I added this alternative that shares code with the gradient memonger and does the blob recycling by traversing the DAG and ensuring that blobs do not pass parallel branches.

Reviewed By: viswanathgs

Differential Revision: D4884303

fbshipit-source-id: dfd0a6ecdb91f4edbb0b743729c92f4cd015602e
2017-04-13 22:08:09 -07:00
d65892b7f2 Change back the function signature of relu gradient to only use
Summary:
This allows us to do in-place relu and also corrects the previous error of
inconsistency between the cudnn impl and the non-cudnn impl.

This implementation butchers the cudnn interface, in the sense that we pass
in the output instead of the input for the gradient pass. We do have a
gradient checker to guard this situation, so we should be safe.

Reviewed By: asaadaldien

Differential Revision: D4889426

fbshipit-source-id: 081f8fe06de78413b5786086bfd5ae6c8128cd6e
2017-04-13 22:08:09 -07:00
e8cc5563fe Add an optional forget bias argument to LSTMUnit
Summary: Add an option to bias the forget gate one way or another by adding in some float value before the sigmoid is applied.

Differential Revision: D4880712

fbshipit-source-id: 1306a97c29fb31630838b2f96597a46e952d940a
2017-04-13 21:49:17 -07:00
246bedd406 Add counter for task processing wall time
Summary: This allows to check what's the real cost of each PS request for each parameter, and hopefully will allow to improve the sharding logic.

Reviewed By: dzhulgakov

Differential Revision: D4799210

fbshipit-source-id: d18effc671f3f7a611e535e09bde360ef0102a33
2017-04-13 20:44:10 -07:00
f94f43fd6e Working sparse gradients for data parallel model
Summary: This diff enables sparse gradient synchronization between GPUs. The test case is now a bit too convoluted, but once D4871680 is landed, we can simplify it a bit.

Reviewed By: dzhulgakov

Differential Revision: D4877087

fbshipit-source-id: 37bbb07051cbaf3a6e3c54b0eead97f3e02337d5
2017-04-13 17:39:23 -07:00
69f42e3f70 make CopyGPUToCPU/CPUToGPU handle sparse gradients
Summary:
CopyGPUToCPu and CopyGPUToCPU need to handle gradients that come sparse on their way. Added unit test and fixed the gradient makers to create copies for both value and indices.

This becomes less important with gpu sparse parameter update ops land, but nevertheless good to fix.

Reviewed By: dzhulgakov

Differential Revision: D4882327

fbshipit-source-id: aafd2df46b3e1bcb30b52b1edf40fad8271f1f88
2017-04-13 17:16:26 -07:00
b61174047f Add threshold to switch between host/device reduce and bcast depending on buffer size
Summary: Device reduce is more efficient for large buffer sizes. For smaller buffers, host reduce may be more efficient in some cases and frees up the GPU for other work.

Reviewed By: andrewwdye

Differential Revision: D4885855

fbshipit-source-id: 7dc522e8c93e1a94427730aca6af03b7e93e660d
2017-04-13 15:05:47 -07:00
baf33161d4 GatherRecord layer
Summary: Perform gather on the whole record. This will be used for negative random sampling.

Reviewed By: kennyhorror

Differential Revision: D4882430

fbshipit-source-id: 19e20f7307064755dc4140afb5ba47a699260289
2017-04-13 15:02:44 -07:00
8d93fcf13f Don't allow overwriting keys in HashStore
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4885102

fbshipit-source-id: c46c180fa8e6dd354921d562830b3515ba91c964
2017-04-13 12:35:32 -07:00
8bd0522c20 Add tests and GPU impls for sparse optimizers
Summary:
These GPU paths are probably even buggier than the CPU paths for sparse gradients with duplicate indices. Both paths cause multiple momentum updates in a single iteration, but only the GPU path is non-deterministic. Depending on how we decide to address the issues on the CPU path, pooyadavoodi has a good idea for how to match dense behavior with the sparse GPU ops.
Closes https://github.com/caffe2/caffe2/pull/254

Reviewed By: bwasti

Differential Revision: D4871680

Pulled By: dzhulgakov

fbshipit-source-id: 220be57a0f699a22ea85ed4f7022d92d362d06b3
2017-04-13 11:07:40 -07:00
a559893c9f Instantiate nccl type templates for gloo (minus half)
Summary:
Instantiate nccl type templates for gloo (minus half).
half requires at a minumum ifdefing CUDA_HAS_HALF and likely requires
more work given that operators aren't defined on it, so skipping it
for now.

Reviewed By: pietern

Differential Revision: D4876217

fbshipit-source-id: 833d2aec12789cbaf9e0a201b979a420fbe6732f
2017-04-13 10:52:38 -07:00
83f360887f new SumReduceLike op CPU/GPU implementation and doc
Summary:
new SumReduceLikeOp CPU/GPU implementation and doc. Unit tests and NMT team tests passed. Some benchmark results here:

  shape(A) = [100, 1000, 100]
  shape(B) = [1000]

  0.36684 ms/iter (0.00122679 ms/iter) SumReduceLike
  0.246593 ms/iter (0.00151116 ms/iter) ReduceBackSum
  0.202563 ms/iter (0.00511932 ms/iter) ReduceFrontSum
  // This means that we are faster than back+front sum now

  shape(A) = [32, 32, 100]
  shape(B) = [32, 100]

  0.0253826 ms/iter (0.00257504 ms/iter) ReduceFrontSum
  0.0233368 ms/iter (0.00118283 ms/iter) SumReduceLike

  shape(A) = [32, 32, 100]
  shape(B) = [32, 32]

  0.0276206 ms/iter (0.00691918 ms/iter) ReduceBackSum
  0.0254768 ms/iter (0.00325529 ms/iter) SumReduceLike

Reviewed By: Yangqing

Differential Revision: D4873222

fbshipit-source-id: 736b1537998f4289876bc53d38607b8052e89c70
2017-04-13 10:28:46 -07:00
50c2759afe Expose missing headers
Summary: Closes https://github.com/facebookincubator/gloo/pull/25

Differential Revision: D4883908

Pulled By: pietern

fbshipit-source-id: 662a8fdf83ad099295b11043194de25c747e8286
2017-04-13 10:08:06 -07:00
da93963860 add input/output blob name when exception thrown from tensor
Summary: Added a field caller_ to caffe2::EnforceNotMet and mofified operator Run() exception handler to add the input/output name of the blob being accessed to the error message. Note that this  is not able to distinguish case when blob occurs in both input and output, but I believe this is still helpful.

Reviewed By: salexspb

Differential Revision: D4863982

fbshipit-source-id: f6a872fb07f8957dc2d3366d9f106fa81bffbd72
2017-04-13 09:03:33 -07:00
05002442eb Renaming DuplicateOp to LengthsTileOp
Summary: making the name a bit clearer

Reviewed By: xianjiec

Differential Revision: D4866940

fbshipit-source-id: 3e0f7067a9d3ba89cb038d85c1991e541f1e439c
2017-04-12 22:04:20 -07:00
cb66e9cf78 torch.diag bug fix (#1251) 2017-04-12 20:59:12 -07:00
735f5af87e Add new variant of halving/doubling algorithm that pipelines local reduce/broadcast with communication steps
Summary: Added a pipelined version of cuda halving/doubling algorithm. Half the buffer is reduced prior to first send and the other half prior to reducing the result from first receive. Broadcasts are started asynchronously as soon as each new message is received. New code was added as a new algorithm, as pipelining makes performance worse for small buffer sizes.

Reviewed By: pietern

Differential Revision: D4847109

fbshipit-source-id: 5aa55de95f8c94069380af7396f2b5b6297dcbea
2017-04-12 18:01:22 -07:00
8c9f4d8c3b Add throughput information to resnet50_trainer
Summary:
TSIA

Makes it easier for throughput debugging.

Differential Revision: D4879634

fbshipit-source-id: 8d479d51b0ec51ad3d86ad5500fc3095400cf095
2017-04-12 17:46:14 -07:00
580ff3a594 Revert D4854240: [EAZY][C2 OSS] Add normalization helpers and proxy to CNNModelHelper
Summary: This reverts commit 3fa594d79960742b34e20d843e8b6ef8aeb601d3

Differential Revision: D4854240

fbshipit-source-id: d08cb30f188f876e1962f53a44f4e6d4ea68297f
2017-04-12 16:46:01 -07:00
32b30ff1fe Revert D4854440: [EASY][C2 OSS] Add Nonlinearity helpers and proxy to CNNModelHelper
Summary: This reverts commit a337e5279729f1c938f34b3994ab8827ee94aa93

Differential Revision: D4854440

fbshipit-source-id: 00ef9724654990356be9df9bb1f65b4fd0fd0ffc
2017-04-12 16:36:33 -07:00
a8ef3b4090 Revert D4855073: [EAZY][C2 OSS] Add array_helpers and proxy to CNN
Summary: This reverts commit 7272f62cff5d065eb028b8118a1ca190bd801fd5

Differential Revision: D4855073

fbshipit-source-id: a121e6bb98c37c7af0b59efad275e00bd5d21163
2017-04-12 16:36:33 -07:00
7867262d39 Revert D4855040: [EASY][C2 OSS] Add Algebra and train helpers and proxy them to CNNMH
Summary: This reverts commit d948ea913f674a6e47c4b72629a2d33253cb3130

Differential Revision: D4855040

fbshipit-source-id: c8efa9566a3ec6b9a9d3ad0e8cab3cc656627473
2017-04-12 16:36:32 -07:00
c852883086 add named_parameters that yield name and value of parameters (#1242) 2017-04-12 16:32:36 -07:00
ab77e4c3d7 Merge commit '62c584ba7972dbba404766aa06d1a558282b4169' 2017-04-12 15:06:58 -07:00
2444278b8b Merge commit '4336e9ea6641b8ac2814eaef2adef64e4106459c' 2017-04-12 15:06:10 -07:00
62c584ba79 Fix abs with char and short cuda types. (#747) 2017-04-12 15:04:59 -07:00
fbd53d87bf block wide reduction with multiple values to reduce at once (#745) 2017-04-12 15:04:43 -07:00
c907c7c7dc Update resnet50_trainer example
Summary:
A few fixes in this commit: the epoch size is now rounded
down to the closest integer multiple of the global batch size (batch
per GPU * GPUs per hosts * hosts per run). The num_shards and shard_id
parameters are now passed to CreateDB so multiple processes actually
train on different subsets of data. The LR step size is scaled by the
number of hosts in the run. The test accuracy is only determined after
each epoch instead of after every so many iterations.

Differential Revision: D4871505

fbshipit-source-id: d2703dc7cf1e0f76710d9d7c09cd362a42fe0598
2017-04-12 14:03:51 -07:00
71303b8af4 Autograd deadlock for recent glibc fix (#1243) 2017-04-12 22:24:31 +02:00
4336e9ea66 Revert "make it compile on Windows + use ilp64 MKL" (#1002) 2017-04-12 12:07:16 -07:00
f5ac83b060 LengthsGatherOp
Summary:
Length-aware gather operator. This will be use for random negative sampling. See the task for details.

This should be equivalent to:

LengthsToRange + Gather + Reshape + GatherRanges

That's pretty complicated.

Differential Revision: D4846023

fbshipit-source-id: 8d9b7ff3eddc75a7ab147cd1c2a12f377652df93
2017-04-12 12:01:35 -07:00
bbcdc91135 Remove prof_dag from step net
Summary:
prof_dag in step net is not supported

(Note: this ignores all push blocking failures!)

Differential Revision: D4876551

fbshipit-source-id: 4003e60908e51ef052f8656bf527b326676c298c
2017-04-12 11:01:30 -07:00
154d49cc6a Caffe2: add schema for SumElementsGradient
Summary: Caffe2: add schema for SumElementsGradient

Reviewed By: jamesr66a

Differential Revision: D4873313

fbshipit-source-id: eba03d22cd260c99d13b215540b3d62f65e900d3
2017-04-12 10:09:27 -07:00
4967db0756 sanity checks for data parallel model
Summary: To help dgponinath, and people in general: check that params don't have duplicate entries.

Differential Revision: D4872132

fbshipit-source-id: 1cca1237fda771eb270227f452ecae0f912d7a33
2017-04-12 09:32:12 -07:00
75c2168966 Generalize PoolingOp(CUDA) to compute 1D, 2D and 3D pooling.
Summary: Extend MaxPooling & AvergePooling CUDA ops to compute 1D, 2D & 3D pooling.

Differential Revision: D4866699

fbshipit-source-id: 9bf2d970f2df2b87194a539fc60c07ac19fa1042
2017-04-12 09:16:45 -07:00
d48afd41f9 Add print string for MaxPool3d, change for MaxPool2d (#1115) 2017-04-12 15:58:28 +02:00
fd5643e426 Add math::Gemv<double, CUDAContext> by cublas::cublasDgemv
Summary: support double gemv in CUDAContext

Differential Revision: D4872986

fbshipit-source-id: c6397c5a3b2667ca446deca0f5edbcc7f29f7a1e
2017-04-12 01:17:47 -07:00
8de1ce57d2 Add Algebra and train helpers and proxy them to CNNMH
Summary: Add Algebra and train helpers and proxy them to CNNMH

Reviewed By: salexspb

Differential Revision: D4855040

fbshipit-source-id: d948ea913f674a6e47c4b72629a2d33253cb3130
2017-04-11 23:03:00 -07:00
b2e94a7bcb Add array_helpers and proxy to CNN
Reviewed By: salexspb

Differential Revision: D4855073

fbshipit-source-id: 7272f62cff5d065eb028b8118a1ca190bd801fd5
2017-04-11 23:02:59 -07:00
e7cdd90490 Add Nonlinearity helpers and proxy to CNNModelHelper
Summary: Add Nonlinearity helpers and proxy to CNNModelHelper

Reviewed By: salexspb

Differential Revision: D4854440

fbshipit-source-id: a337e5279729f1c938f34b3994ab8827ee94aa93
2017-04-11 23:02:59 -07:00
b8f2baec8e Add normalization helpers and proxy to CNNModelHelper
Summary: Add normalization helpers and proxy to CNNModelHelper

Reviewed By: salexspb

Differential Revision: D4854240

fbshipit-source-id: 3fa594d79960742b34e20d843e8b6ef8aeb601d3
2017-04-11 23:02:59 -07:00
d35b7569db Add Pooling Helpers, proxy to CNNModelHelper
Summary: Add Pooling Helpers, proxy to CNNModelHelper

Reviewed By: salexspb

Differential Revision: D4854014

fbshipit-source-id: 672fcd886153136b707866400b2705544eaf4ec9
2017-04-11 23:02:59 -07:00
e21e4bf3e8 add pyyaml to conda note here as well 2017-04-11 21:21:18 -07:00
570c6bb9b7 Fix backward pass computation when an input is used in a Fill-op input for shape
Summary:
Fix issue that amyzhang encountered. She was using ConstantFill to create a blob of same size as an another blob. This caused the gradient op computation flow to interrupt through the ConstantFil since the gradient for the input blob was set to None (although it had another gradient already set). The correct solution is to avoid overwriting gradient assignments with None, if they already have a gradient. UNLESS that blob is output of the same op, as with StopGradient op. (Note that Amy's problem was fixed by using instead a fixed shape ConstantFill and Add with broadcast=1, which is better solution anyway).

Not sure if I explained this well, but see the new unit tests. Before this change, the testAddAndDynamicConstant failed but the testAddAndStaticConstant succeeded.

Reviewed By: dzhulgakov

Differential Revision: D4861176

fbshipit-source-id: 3b53621bfaba2e36786a5e4664145038995f6616
2017-04-11 19:32:22 -07:00
f0426e6288 remove TODO comment
Summary: TODO is incorrect, code was fixed in D4024922.

Differential Revision: D4872233

fbshipit-source-id: d66af5e099c3b7beb38cdb8e6acd4b161c8e28f9
2017-04-11 19:05:05 -07:00
09bfc8043b Generalize PoolingOp(CPU) to compute 1D, 2D and 3D pooling.
Summary: Extend the op compute 1D, 2D & 3D pooling.

Differential Revision: D4828691

fbshipit-source-id: 87540e82ed20d1361476cfbc43a708de9ca7a88e
2017-04-11 18:18:21 -07:00
8e36339911 Merge commit '0925c91e80cc1b3a86fcbc54570f5bb204c9cb77' 2017-04-11 18:00:44 -07:00
5391fe8953 addr zeroes output buffer when beta=0 2017-04-11 18:00:11 -07:00
0925c91e80 addr zeroes output buffer when beta=0 2017-04-11 17:59:42 -07:00
0f43ac6865 use GPUFallback for TopK
Summary: Use GPUFallback (to CPU) for TopK operator.

Differential Revision: D4870842

fbshipit-source-id: e3d6ca769b5cbb9ed7dc898a53e789da596b2685
2017-04-11 17:04:54 -07:00
253c854da5 update Dockerfile not to use requirements.txt 2017-04-11 15:42:05 -07:00
7c59754d24 update source build instructions 2017-04-11 15:24:31 -07:00
2bf7dc643f Merge commit 'aec658f8708a6f4448329da006d14ff2e13dc821' 2017-04-11 15:02:36 -07:00
ce30c76823 Merge commit '2b37ecfccf810a8e21c2c9ac9a943ce2f7c01015' 2017-04-11 15:02:16 -07:00
a8d60ad3ac fix THNN headers 2017-04-11 15:00:30 -07:00
aec658f870 fix THNN headers 2017-04-11 14:57:11 -07:00
2b37ecfccf fix THNN headers 2017-04-11 14:56:53 -07:00
01a35dcace Fix coalesced CUDA collectives for nonhomogeneous lists 2017-04-11 14:48:54 -07:00
afeeb81e79 Add support for keyword arguments in torch.cat 2017-04-11 14:48:54 -07:00
6002f94232 Fix is_tensor and is_storage for old-style classes 2017-04-11 14:48:54 -07:00
a5c7d98611 Import TripletMarginLoss 2017-04-11 14:48:54 -07:00
605b3c86ce Retain the type of numpy scalars in collate_fn 2017-04-11 14:48:54 -07:00
2087b1157a Improve serialization error messages 2017-04-11 14:48:54 -07:00
81e972031d Handle all errors if Module's sources can't be retrieved 2017-04-11 14:48:54 -07:00
81a55f441c Adds interfaces to check the existence of a DB
Summary:
To evaluate on checkpoints, we often need to load from multiple checkpoints.
However, it is inconvenient if we always need to check the existence of
a checkpoint manually. Adds interfaces to check the existence of a DB
so that we can find available checkpoints automatically.

Reviewed By: azzolini

Differential Revision: D4823876

fbshipit-source-id: e5a65b736ac2addd0447c4add81dbd0986f422e7
2017-04-11 14:07:49 -07:00
e9ff57176b Fused pointwise kernels for GRU/LSTM 2017-04-11 13:42:06 -07:00
a739960515 Merge commit 'cfa504691c2ce5e10010ffb6cd43001c59109aea' 2017-04-11 13:41:54 -07:00
f43320dbf2 Merge commit '0dc52abe9a673547caf79ac64c73e8e16fb37b33' 2017-04-11 13:41:42 -07:00
e362a64975 release notes for v0.6.1 (#260)
* updated ubuntu instructions

* updated ubuntu notes and troubleshooting
2017-04-11 13:40:03 -07:00
cfa504691c Fused pointwise kernels for GRU/LSTM 2017-04-11 13:36:38 -07:00
0dc52abe9a Fused pointwise kernels for GRU/LSTM 2017-04-11 13:36:02 -07:00
1e5140aa76 option to recompute blobs backward pass with massive memory savings
Summary:
This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator.

For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive.

For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep).

I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look.

Added options to LSTM, MILSTM and LSTMAttention to enable memory mode.

Reviewed By: urikz

Differential Revision: D4853890

fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa
2017-04-11 13:03:48 -07:00
0b50f794e9 Use thnn version of Tanh/Sigmoid instead of autograd. (#1234) 2017-04-11 12:49:57 -07:00
15c6f637d6 create bucket-based calibration - layer
Summary:
The basic idea of bucket-based calibration:
1. given a model and a calibration data set
2. apply the model to the calibration data set and sort the prediction scores
3. bucketize the prediction scores
4. for the samples in each bucket, compute the proportion of positive samples
5. build a set of piecewise linear functions that map from the bucket range to the proportion
6. appends an operator of piecewise linear transform to the prediction net that is supposed to calibrate the raw predictions.
7. to support calibration in realtime training, we create a new type of Net -- bucket calibration net. This needs a new Context to add_calibration_ops(), to export and load the new Net.

This includes a series of diffs.

This diff implements a layer that adds different operators for train/cali/eval for bucket based calibration.

Reviewed By: dragonxlwang

Differential Revision: D4817119

fbshipit-source-id: 44f8fcad2a94f40f7439cc1ad47e7bae5e17397d
2017-04-11 12:30:26 -07:00
06ae1ff534 Add fp16 dispatch to main cuDNN operators
Summary:
Adds support for fp16 to main cuDNN ops (conv, relu, pool, BatchNorm).

Done via. runtime dispatch, not using DispatchHelper at this point to allow for more complex dispatch logic in the future if necessary. Using separate template  for all input / output types is for these reasons also - it's easier to add the functionality now and never use it, than need to add it later.
Closes https://github.com/caffe2/caffe2/pull/241

Differential Revision: D4831264

Pulled By: asaadaldien

fbshipit-source-id: ad2ffdb13c031d8eb20552ffbf83c05c278252f7
2017-04-11 12:30:24 -07:00
2abbb5133c Fixing function signatures: long -> ptrdiff_t (#1232) 2017-04-11 11:37:21 -07:00
67468212e3 updated ubuntu instructions (#259) 2017-04-11 11:36:22 -07:00
fcf8387779 Fix ibv_devices wrapper if device list is empty
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4866469

fbshipit-source-id: 6bbde8ec9d71ea89ccdab379d48d122b90237460
2017-04-11 11:04:54 -07:00
ade105fb7c update README to install pyyaml from conda (#1231) 2017-04-11 10:23:45 -07:00
70e9c08f27 feature processing ops
Summary:
add necessary ops for feature processing
* logit op
* replace nan
* batch one hot op

Reviewed By: kittipatv

Differential Revision: D4840869

fbshipit-source-id: 197123ea5608d54f0b5ac7899973a077a6a86775
2017-04-11 07:07:51 -07:00
4dc1dbab05 MILSTM Cells with and without attention
Summary: Bug fix for MILSTM implementation: parameters were not trainable.

Reviewed By: urikz

Differential Revision: D4864581

fbshipit-source-id: a3fdb7a85c8d87c5117328ca8cae4fb6352728d0
2017-04-11 02:01:23 -07:00
22584b546a Revert D4711302: SumReduceLikeOp CPU/GPU implementation
Summary: This reverts commit 0865abde871b3046b367599731593dae03f0775a

Differential Revision: D4711302

fbshipit-source-id: 6c22e683544f6627142fc9970a781ec98f682cad
2017-04-10 23:01:26 -07:00
84ee795b25 remove net_predictor_extract.py
Summary: Having directory utils broke open source build :(. Removing the contents as this utility is not really needed.

Differential Revision: D4866228

fbshipit-source-id: 1eae4580ebac5b60e52e2e8553e0ffd919152228
2017-04-10 21:33:27 -07:00
092c1440a2 SumSqrElements
Summary:
Added SumSqrElements, since then we can avoid a large temporary blob which is needed when doing Sqr + SumElements.

Also moved to reduction_ops, because utlitity_ops has grown too big.

Reviewed By: jamesr66a

Differential Revision: D4844172

fbshipit-source-id: 032eec45e24d6724f0d5fb83f4ec1c771d1146e5
2017-04-10 16:16:52 -07:00
4e693d12ab Merge commit '79c4cb96b16dac603247ffd88c473e84565915a9' 2017-04-10 14:35:54 -07:00
79c4cb96b1 fix memory leak in btrisolve and getri 2017-04-10 14:35:07 -07:00
97bd6aae37 Throw error if Redis replies with error
Summary:
The code already asserted, but only on the reply type, so it didn't
include the actual error message. This makes debugging problems much
easier when people have problems running the benchmark suite.

Differential Revision: D4860022

fbshipit-source-id: 659bc461a724603375bff18eac90eca658492b05
2017-04-10 10:49:59 -07:00
f618ea9f31 Update README.md
Summary:
Mention GPUDirect in README
Closes https://github.com/facebookincubator/gloo/pull/24

Differential Revision: D4860167

Pulled By: pietern

fbshipit-source-id: 80804c778cdc6a9bcd8febe7e05142145cc6c61b
2017-04-10 10:49:59 -07:00
f6fef3718e fix typo in autograd.rst (#1219) 2017-04-10 01:16:59 -04:00
3fcdd6a42b Reuse sockaddr information from device
Summary: This is cheaper than doing getaddrinfo for every pair.

Reviewed By: andrewwdye

Differential Revision: D4850102

fbshipit-source-id: e77f468f099f63860b52fdd0dcc57a8a7a91a448
2017-04-09 16:37:41 -07:00
707c1ca4cc Function to retrieve PCI bus ID from device
Summary:
Part of this change is to perform a getaddrinfo in the TCP device
class so we can figure out the interface and subsequently PCI bus ID
of the NIC used for its traffic. This information can be used in a
later diff to avoid doing getaddrinfo calls in the TCP pairs and have
them reuse the information that is resolved by the device.

The PCI bus ID can be used to compute distance between NICs and GPUs
and make informed decisions on where to allocate scratch buffers.

Reviewed By: andrewwdye

Differential Revision: D4850035

fbshipit-source-id: 575e401a9273300bc720c814fef8971846ec748c
2017-04-09 16:37:41 -07:00
bc0ed9298d remove incorrect version in readme 2017-04-09 14:44:44 -04:00
040cf42643 Merge pull request #455 from twitter-forks/indexlinear
Adding Indexlinear
2017-04-09 13:52:56 -04:00
6d9ad1d66a Adding IndexLinear (#1181)
* Add IndexLinear

* Fixes to IndexLinear

- Fix IndexLinear test
- make it better for multithreaded case
- fix a glitch in the C code
- improve the reset() method
- fix the weight allocation.
- remove "fakeBatch" possibility as it's not used
- clamp normalized values at evaluation time instead of just dividing by max.
- add assert on the keys/values dimensions in IndexLinear.
- invert order of weightDecay in the case of output dim > 1.

* Changes required to support IndexLinear in CUDA

* Adding support for flattened inputs for IndexLinear

* Doc for IndexLinear + fix for when the input format changes from one batch to another.

* Cleaning up IndexLinear documentation

* Changes required to build with latest torch

* Adding benchmark script for IndexLinear

* Bugfixes and cleanup of IndexLinear.lua

- Fixed bug that occurs when performing multiple accGradParams +
  updateParams

- All the data required for the updates is put in a single table

- Added :pararameters method
2017-04-09 13:51:45 -04:00
d1af311224 PiecewiseLinearTransformOp supports passing params from input blobs.
Summary:
The PiecewiseLinearTransformOp passes the transform parameters (bounds, slopes, intercepts) via operator arg. This diff supports to pass these parameters through input blobs.

The purpose is to allow us to create a model calibration net that can be exported when saving model.

Reviewed By: dragonxlwang

Differential Revision: D4777086

fbshipit-source-id: 0d157154860f61ec6ecfab95aea80beed54aa5c6
2017-04-08 11:02:35 -07:00
64ee4056d7 updated docker image inside the docs (#1216) 2017-04-08 10:29:03 -04:00
d8b9e787c2 DuplicateOp
Summary: This is like LengthsToSegmentIds + Gather w/o the immediate segment IDs blob. I only realized that after I wrote the whole thing. That combination is not obvious, so just check this in?

Reviewed By: xianjiec

Differential Revision: D4847591

fbshipit-source-id: a1c480f16b317763866af13c83b3aaaeb6a60751
2017-04-08 00:01:59 -07:00
b0adcf02f8 remove workspace sequence id
Summary: As said in the title. This should save a lot of memory if using both train and test workflows.

Reviewed By: jhcross

Differential Revision: D4855436

fbshipit-source-id: 9eeca548eee118e07bd587c46f40e7beb138318e
2017-04-08 00:01:59 -07:00
a2065f3c1e report capacity bytes as part of workspace blob stats
Summary: Instead of reporting the number of total elements of tensor, report the number of bytes. But report the capacity of the tensor, not the current number of bytes.

Reviewed By: jamesr66a, salexspb

Differential Revision: D4851633

fbshipit-source-id: 464d552f41f1b5f25753b0e7001d299b6dac1966
2017-04-07 19:16:37 -07:00
64599d8351 create helpers package and add dropout
Summary: Helpers package and Dropout helper file

Reviewed By: salexspb

Differential Revision: D4837140

fbshipit-source-id: cd3030974421ce6830747935183e098aa04b2803
2017-04-07 17:33:49 -07:00
55d69b5ade Merge commit '88bcfc15316e3c878237a8f95aeb6e72402c90ff' 2017-04-07 17:20:52 -07:00
0d7d6e1f0d Merge commit '662163bef68a9d64f3cb13a903638c870c0b4aa6' 2017-04-07 17:20:15 -07:00
b16a352a3b Fix remainder and cremainder for integer types 2017-04-07 17:17:44 -07:00
88bcfc1531 Fix remainder and cremainder for integer types 2017-04-07 17:16:59 -07:00
662163bef6 Fix remainder and cremainder for integer types 2017-04-07 17:16:31 -07:00
4026593240 check for beta=0 and avoid multiply in sparse mm (#1211)
* check for beta=0 and avoid multiply in sparse mm
2017-04-07 20:14:32 -04:00
a931064a52 Merge commit '441d75ce569f89bad3e2f1f2a2075e68ae3bc76b' 2017-04-07 16:57:05 -07:00
441d75ce56 Adapts basic operations to new THXVector interface 2017-04-07 16:56:12 -07:00
813452608c Add Reduction layer in caffe_translator
Summary:
Caffe's Reduction corresponds to Caffe2 ReduceBack*. Added a translator for
stephenyan1231's model.

Reviewed By: stephenyan1231

Differential Revision: D4848289

fbshipit-source-id: 7cb61c115549ffa6be8d0c19a5eaed99c3c086b6
2017-04-07 16:17:07 -07:00
e153643b6c tutorial updates (#257)
* added dataset downloader from s3 func; leveldb creator func; refactored to use both of these

* working version for squeezenet only

* using fb.me link for mnist dataset

* ubuntu installation instuctions for v0.6.0

* removing non-functional tutorials

* updated model download info

* model download updates

* new tutorial

* bump version to v0.6.1

* tutorial helper functions
2017-04-07 16:16:31 -07:00
dc5a34200f SumReduceLikeOp CPU/GPU implementation
Summary:
1. CPU/GPU implementation of SumReduceLikeOp.

[SRLOp](matrix A, matrix B) -> C

where C is of the same shape as B, its value would be the reduce sum of corresponding A element.

2. Make SumReduceLikeOp (part of) the gradient of Add/Mul/Sub and provide unittests

===Update for Translation Team===
3. Passed Tests:
$ buck test caffe2/caffe2/python/operator_test:recurrent_network_test
$ buck test fblearner/flow/tests/langtech/translation/neural_mt:seq2seq_model_caffe2
$ buck test fblearner/flow/tests/langtech/translation/neural_mt:seq2seq_ensemble_beam_model_caffe2

Reviewed By: Yangqing

Differential Revision: D4711302

fbshipit-source-id: 0865abde871b3046b367599731593dae03f0775a
2017-04-07 15:19:24 -07:00
8482cf9823 TensorVectorSizeOp
Summary: Put the size of the input tensor vector into the output blob

Reviewed By: xianjiec

Differential Revision: D4849556

fbshipit-source-id: 0929319e1705b027874d41a90a9159b335d93545
2017-04-07 14:46:19 -07:00
c101856214 Disable openmp when building for android
Summary: Closes https://github.com/caffe2/caffe2/pull/256

Reviewed By: salexspb

Differential Revision: D4853865

Pulled By: bwasti

fbshipit-source-id: 57768d538281bec2b18d8c6af7ae58009bbc257e
2017-04-07 14:35:01 -07:00
3de56785fa fix conv1d test and add for padding 2017-04-07 13:56:02 -07:00
5ee8536a02 Merge commit 'a89317a9d407241c97fe4486b3c88de8578445d7' 2017-04-07 13:49:18 -07:00
f00a5d2f54 Merge commit '66a20e5c328836c1eb720cf4e2eb916366aae487' 2017-04-07 13:47:25 -07:00
7ba1c437e3 Create PATENTS 2017-04-07 13:46:29 -07:00
a89317a9d4 fix types in unfold.c 2017-04-07 13:32:04 -07:00
e48db02e10 remove unused python-level BatchNorm.py 2017-04-07 16:27:16 -04:00
7f2553bc6f dont use cudnn batchnorm for cudnn < 5.1.10 2017-04-07 16:27:16 -04:00
acaf279235 Unbreak old model check in caffe_translator
Summary: The check for old model style seems wrong. Fails with a model I tried to run.

Differential Revision: D4847970

fbshipit-source-id: f28c5bb635c5e8b4dcfcc5c52a434d91a89217e8
2017-04-07 12:32:25 -07:00
04bd41a4f2 Downloader fix
Summary:
This fixes some bugs in the downloader.  TODO: fix the URL
Closes https://github.com/caffe2/caffe2/pull/255

Reviewed By: Yangqing

Differential Revision: D4851555

Pulled By: bwasti

fbshipit-source-id: 56d01617ccaddcd40b0fb8e4be137cb4c7a52e91
2017-04-07 10:16:58 -07:00
66a20e5c32 Support TORCH_NVCC_FLAGS environment variable
This is already supported in cutorch since august 2016, and is used in
pytorch integration (to reduce the binary size).
2017-04-07 18:23:22 +02:00
cb3bd0ede8 Added a DP + recursion algorithm for finding optimal blob assignments based on blob sizes.
Summary:
Added a DP + recursion algorithm for finding blob assignments based on blob sizes. This algorithm gives optimal assignments. See comments for details.

The algorithm is not used by default, set algo=memonger.AssignmentAlgorithm.DYNAMIC_PROGRAMMING and provide blob_sizes in optimize_interference() to use it. The blob sizes could be retrieved by running the net once and then calling blob_sizes = memonger.collect_blob_sizes(net). All blob sizes are assumed to be 1 if blob_sizes is not provided. In this case, using algo=memonger.AssignmentAlgorithm.GREEDY may be better.

Testing on the segmentation model, the memory usage is reduced by 19% (14.96MB to 12.08MB) comparing using the greedy algorithm (without considering conv share buffer). The algorithm runs in 15s for the model with 55 sharable blobs.

Reviewed By: ajtulloch

Differential Revision: D4818476

fbshipit-source-id: 606936f4cf2715408d60b9a5cf3bcaf1985a0fec
2017-04-07 02:18:08 -07:00
ffd298376a option to print tensor shapes at exit
Summary:
Added Caffe2 cmd line option --caffe2_print_blob_sizes_at_exit=1, that when enabled, will print all tensor sizes at the workspace destructor. Handy especially when using sub-workspaces like with RNNs. Note that the sizes are number of elements, not bytes. Output is designed to be easily excel-copypasteable.

TODO: add sorting

Reviewed By: jamesr66a

Differential Revision: D4844628

fbshipit-source-id: 11608a1710ae5c89bbd741edb506d25496606185
2017-04-06 21:36:04 -07:00
c7d284a03b ability to disable inputs for extract predictor net
Summary:
This is not a super-elegant, but a working solution to fix Newsfeed-teams problem of extracting a predictor net of a net that has a "side chain" that they want to cut from the middle.

This adds a argument to ExtractPredictorModel that allows one to define "disabled inputs". These are inputs that we want to switch off, so that all operators that depend on that input will be removed from the model.

Differential Revision: D4839953

fbshipit-source-id: 5d16df6f0ec4aac6670e6917efc77abde5d75c95
2017-04-06 17:05:32 -07:00
37d95687c4 Merge commit 'ae1c365dbdbf667ae24c57eec9f2e6b9debf16bd' 2017-04-06 16:37:31 -07:00
f0c7124420 Allow support for negative dimension argument for all functions 2017-04-06 16:37:00 -07:00
ae1c365dbd Add TH_INDEX_BASE to nDimension and stride functions 2017-04-06 16:30:11 -07:00
8c769258f8 fix cnn.Softmax when called with only inputs
Summary: Many dper code was callling model_helper.Softmax() without outputs, causing python error.. Sorry!

Reviewed By: xianjiec

Differential Revision: D4845359

fbshipit-source-id: 7b6d547acb968371bf7cae1eb68fb5a8609877ec
2017-04-06 15:33:54 -07:00
6fd9b53d93 Include common/linux.{h,cc} in CMake build
Summary:
Forgot to include these in a previous commit.
Closes https://github.com/facebookincubator/gloo/pull/23

Differential Revision: D4847072

Pulled By: pietern

fbshipit-source-id: 08aa9e8fa47377eb8c7747bd577eec7e615789f1
2017-04-06 15:20:59 -07:00
e2323ad688 Add CAFFE_ENFORCE to protobuf parsing
Summary: Add CAFFE_ENFORCE to make sure the protobuf parsing is successful.

Reviewed By: salexspb

Differential Revision: D4843662

fbshipit-source-id: 20cab7180e6b0e5afb5e29ff3333591659e41f7a
2017-04-06 14:34:30 -07:00
e692c38fcf Compute distance metric between PCI devices
Summary:
With this we can compute the best GPU device to reduce on. It is not
always the one CUDA indicates as GPU 0.

Reviewed By: andrewwdye

Differential Revision: D4845581

fbshipit-source-id: 13e0500f54fd507899646f781a97c09abcd3b056
2017-04-06 13:50:07 -07:00
23183b9642 memory-saving only_loss argument for SoftmaxWithLoss
Summary: When only_loss=True is enabled, the softmax output buffer is shared with the gradient buffer (which is of same size). Added tests for this. Only for GPU version for now.

Reviewed By: salexspb

Differential Revision: D4843991

fbshipit-source-id: 834d2a1b357d784e4d64efe484f893442201ad6a
2017-04-06 13:04:31 -07:00
59f464434d Used blob sizes for finding assignments in a greedy way.
Summary: Used blob sizes for finding assignments in a greedy way.

Reviewed By: ajtulloch

Differential Revision: D4818159

fbshipit-source-id: 89180a6117ba5be058e1d2f9488b06d618e91917
2017-04-06 12:36:38 -07:00
a54000dc6a Added an ordering function to reduce live spans of computed blobs.
Summary:
Added an ordering function (topological_sort_traversal_longest_path()) to reduce live spans of computed blobs. The idea is to sort the ops based on the length of the execution path so that ops in longer path will be used first.

Tested on segmentation model with on-the-fly decoder and reduced memory usage from 21.7MB to 14MB (original size is 33MB with compressed parameters and without considering the conv buffer), comparing to use topological_sort_traversal() as the ordering function.

It is a general ordering function so I put it in memonger.py directly.

Reviewed By: ajtulloch

Differential Revision: D4790135

fbshipit-source-id: e661b45c1640de44ce1a9fdd009a4fba38f8e042
2017-04-06 12:20:39 -07:00
b922b19bfd add weights bias to modelhelperbase
Summary: add weights and bias to modelhelperbase

Reviewed By: salexspb

Differential Revision: D4837125

fbshipit-source-id: 6a357c0e3d07d35aa6cdeb8ef803976646b9dbe6
2017-04-06 11:16:55 -07:00
5dfa73702f Display runtime information in benchmark output
Summary:
This makes it easier to capture, compare, contrast results with
different parameters.

Reviewed By: andrewwdye

Differential Revision: D4843715

fbshipit-source-id: ba6916dcd5f8bcc615d6edce1a54657241357c31
2017-04-06 11:06:23 -07:00
95140094cb Use CudaStream as first class object
Summary:
Instead of having every CudaDevicePointer "own" a stream, this change
moves to using CudaStream as first class object. It was pretty clunky
to use the copy{To,From}* functions on the CUDA pointer classes to
copy stuff around. For example it was not clear whether the stream
belonging to the source or destination was used to execute the copy
on. There is no longer such ambiguity after this change.

To make this work the CudaBroadcastOneToAll algorithm was changed to
include the workspace template argument, but only has the
CudaHostWorkspace implementation. The CudaDeviceWorkspace
implementation is left to be done for another change (that's not the
purpose of this change).

Reviewed By: andrewwdye

Differential Revision: D4841615

fbshipit-source-id: d0c1b9ba948ff6167832515afa7bdd2b32b48064
2017-04-06 11:06:23 -07:00
76abd9a8ac Caffe2: consolidate AveragedLoss with SumElementsOp
Summary: Caffe2: consolidate AveragedLoss with SumElementsOp

Differential Revision: D4781561

fbshipit-source-id: 6734adb9dd81d4cad1819a5f8fe736de2477cb72
2017-04-06 10:35:01 -07:00
c120322890 Predictor exporter open-sourcing
Summary: This is moving predictor exporter's code to open-source.

Differential Revision: D4815409

fbshipit-source-id: ce1508a2b6b973c91b0420928d2b4c3953f26e6c
2017-04-06 10:01:42 -07:00
ef95926103 Move setTimeout to Device and set default tcp timeout to 30 sec
Summary: Make timeout a device attribute. Now the pair will configure timeout when connecting based on device timeout settings, instead of needing to be set explicitly on each pair. Set default tcp timeout to 30 sec.

Reviewed By: pietern

Differential Revision: D4838918

fbshipit-source-id: e6e6ee36c662eb5e7ba5354c904e50f9dcac258f
2017-04-06 08:50:21 -07:00
e7f5220dfa device_ids can be None again in data_parallel (#1187) 2017-04-06 10:30:53 -04:00
a7ae04a657 fix precedence problem when building with debug python (#1201) 2017-04-06 10:30:16 -04:00
7f03182bfa sizeAverage -> size_average in docs 2017-04-06 01:31:02 -04:00
9f2a5d804d Add a flag to fix when dataset size is not divisible by batch size. (#1133) 2017-04-06 00:18:43 -04:00
a7217e6626 Remove unused optimizers
Summary: As desc.

Reviewed By: xianjiec

Differential Revision: D4840482

fbshipit-source-id: bf820154475508ce581d16a45bcd93d026b60f30
2017-04-05 21:18:29 -07:00
aa506fa4d7 fix docs typo 2017-04-05 23:42:02 -04:00
955869a09a fix cuda_allreduce_halving_doubling to correctly copy between and reduce on GPU buffers
Summary: cuda_allreduce_halving_doubling was not properly handling the case where buffers are allocated in GPU memory, trying to reduce and copy from them as if they were in system memory.

Reviewed By: pietern

Differential Revision: D4840259

fbshipit-source-id: 2615360cd2f1d9c7a37fb0bcdf33ff35528b2c75
2017-04-05 19:56:20 -07:00
d82cad3019 implement nn.Module.__dir__ (#1142) 2017-04-05 22:18:34 -04:00
9504246c32 add triplet margin loss (#1165) 2017-04-05 22:17:58 -04:00
81cf3dbf79 Merge commit '6bd4ecd15390517c68d598d236ffb0929ade277c' 2017-04-05 19:07:01 -07:00
12f1b4f76c Merge commit '84bdbe5ab4b602b021ff494487c8ad57457052d3' 2017-04-05 19:06:14 -07:00
84bdbe5ab4 btrisolve: Add sz checks, correct B's ordering, support nrhs>1. 2017-04-05 19:05:20 -07:00
85954032d9 fix doc formatting 2017-04-05 22:02:29 -04:00
fadbbd2692 ReversePackedSegsOp optimized GPU code
Summary:
Removes the need for all the Copy calls, in one of our apps reduced time from ~40ms to < 200us
Closes https://github.com/caffe2/caffe2/pull/250

Differential Revision: D4828825

Pulled By: pietern

fbshipit-source-id: 656bd0edc4ffbaa3f89ccbe045e28a7aae49ceab
2017-04-05 17:46:51 -07:00
1a04b92226 add note regarding SGD momentum 2017-04-05 20:45:41 -04:00
c66c8f6e84 Add Softmax to cnn.py, cuDNN engine.
Summary: Softmax was not in the model helper, so added it there so we can set the CUDNN engine, as it is the preferred version.

Reviewed By: asaadaldien

Differential Revision: D4835624

fbshipit-source-id: 7f0c84b7a73653119901795782709a6a617345c5
2017-04-05 14:20:23 -07:00
8da2d75ec8 Caffe2/Recurrent] recurrent.py API to cuDNN LSTM
Summary:
Quite large diff to make cuDNN LSTM and our LSTM produce same results and provide python API for the cuDNN LSTM.

* Added operators RecurrentParamGet and RecurrentParamSet to access weights and biases for the different gates, input/recurrent.
* Removed RecurrentInit as not needed
* recurrent.cudnn_LSTM() returns a special net and mapping that can be used to retrieve the parameters from the LSTM
* recurrent.cudnn_LSTM() can be passed blobs that have the parameters for the individual gate weights and biases
* recurrnet.InitFromLSTMParams() can be used to initialize our own LSTM from CUDNN params.  This way we can test if cuDNN and our own produce the same result.

recurrent_test.py tests for the equivalency

Reviewed By: salexspb

Differential Revision: D4654988

fbshipit-source-id: 6c1547d873cadcf33e03b0e0110248f0a7ab8cb0
2017-04-05 14:20:23 -07:00
cf201ebac8 support axis for cudnn softmax
Summary: Added the support of axis for cudnn version of softmax + added cudnn tests to the softmax_ops_test

Reviewed By: urikz

Differential Revision: D4835409

fbshipit-source-id: 9150b969237e38daebff961fee3c36759f834ac4
2017-04-05 14:06:03 -07:00
320b598ff1 Add NanCheckOp, an operator that checks for NaNs and inf's on both the forward and backward pass.
Summary: NanCheck is an in-place operator for GPU that checks the input for any NaN or inf values. The operator fails and prints diagnostic information (input tensor dims and values) if it detects these erroneous values. This should help us to narrow down our numerical instability issues in the NMT models, and it might help others as well.

Differential Revision: D4818141

fbshipit-source-id: e5aa9762089c58ce160270446007c7a91a7a85e5
2017-04-05 13:07:59 -07:00
8a822d48f5 Update README.md
Summary:
Clarify that Redis Cluster is not supported. Also see #21.
Closes https://github.com/facebookincubator/gloo/pull/22

Differential Revision: D4837375

Pulled By: pietern

fbshipit-source-id: 6e3575b3b8dae6ca62beb765da15d8506da4abdb
2017-04-05 13:06:48 -07:00
5511ad258b cuda version of recursive halving/doubling allreduce
Summary: Basic port of the CPU halving/doubling algorithm. No pipelining is done between reduce/broadcast and communication.

Reviewed By: pietern

Differential Revision: D4823693

fbshipit-source-id: b18045d64edf90361bf7713f4ccb2e074757780f
2017-04-05 12:39:16 -07:00
75a635630d Update to ignore zero targets
If the target is zero, loss and gradient of input are set to zero. It
is useful for variable-length natural language generation models.
2017-04-05 11:51:54 -07:00
26d301fbe4 Configurable CuDNN workspace limit in resnet50_trainer
Summary: TSIA

Reviewed By: Yangqing, bwasti

Differential Revision: D4835477

fbshipit-source-id: a0083188fe91a56c5f910c7dda46412e38632d7e
2017-04-05 10:50:00 -07:00
ecd3bda44e Fix Softmax for CUDA
Summary:
Following jamesr66a's brilliant observation, this diff fixes the non-CUDNN versions of Softmax. The op did not take into account that blocks can run in parallel, and thus could overwrite each others values, particularly the "row max" that is important for numerical stability

So in this diff:
1) SoftmaxOp now shares all the code with SoftmaxWithLoss, that had better implementation

+ Strengthen the test case and renaming of file.

Reviewed By: jamesr66a

Differential Revision: D4832929

fbshipit-source-id: 4a1bfa2106ceb65ec75f5b868323ee1e7a3457fb
2017-04-05 10:07:54 -07:00
8e6524938b Undo D4832492 for Gloo
Summary: No folly dependency in Gloo.

Reviewed By: andrewwdye

Differential Revision: D4835050

fbshipit-source-id: 97d0c14fb770fdde68206ca5a20a974bef156392
2017-04-05 09:51:05 -07:00
02f0c1c9d7 make memonger work with RecurrentNetwork(Gradient)
Summary:
This diff enables support of recurrent networks for memonger:
1. Memonger descends into the step-nets and renames the blobs accordingly
2. Memonger tells the gradient op about the renamed blobs by adding a parameter "paramname.renamed=<new name>"
3. RecurrentNetworkGradientOp applies remapping to links and gradient blobs.

I first thought of refactoring the whole gradient blob management of the recurrent network, but that looks to be very hard without a major revise of the code.

Note, I did not enable memonger for neural_mt, since I think the team should do more testing before enabling this.

Reviewed By: salexspb

Differential Revision: D4812823

fbshipit-source-id: 1ffdf3cfb4fcd00eec5bb0ece3bf416aa6d3e26b
2017-04-05 09:48:25 -07:00
65439e849b Fix mixed context loading validation
Summary:
Description.
We kinda have our hands tied here, can't reference conext_gpu since it needs to run under _gpu TARGET to pick up correct headers and can't change the interface of deserialize blob to return size since not all blobs are tensors.
If this works then let's ship it.

Reviewed By: urikz

Differential Revision: D4826034

fbshipit-source-id: 631ba56386ccb91d9b19d780a3e012d0ceea2422
2017-04-05 08:20:03 -07:00
4e4cfd8b2b Fix main()s to call folly::init/initFacebook/registrationComplete (part 14)
Summary:
Required for D4821763
Based on targets from https://fb.facebook.com/groups/fbcode/permalink/1304073246296178/ (I also excluded those targets which do not depend on folly:singleton).

Reviewed By: meyering

Differential Revision: D4832492

fbshipit-source-id: fcb4ce42e9e5359d4752769f77d7271e550201fe
2017-04-04 20:50:47 -07:00
66d00b3a63 Use CUDNN softmax implementation
Summary: The caffe2 implementation of bare Softmax() has a race condition that wipes out the numerical stability trick. Use the CUDNN implementation instead

Reviewed By: urikz

Differential Revision: D4831298

fbshipit-source-id: d11b1de700e3954629e7ed43225a2416c27b3252
2017-04-04 20:02:21 -07:00
5f263c6175 RecurrentNetwork and variable length links
Summary:
Two new features for RecurrentNetwork:
1. Ability to specify longer (for a few steps) initial state
2. Ability to link more than one step of external blob to internal one.

Some motivation for these changes is provided in the unit test

Reviewed By: salexspb

Differential Revision: D4816230

fbshipit-source-id: 5ae6fed53b3b08a6ce4547ff1d0cb773dab42af0
2017-04-04 19:46:53 -07:00
6bd4ecd153 Use thrust::inclusive_scan for 1D cumsum/cumprod (#742)
For large 1D tensors thrust::inclusive_scan is much faster than our
current implementation.
2017-04-04 21:05:10 -04:00
5c802c5ba9 Refactor AllgatherRing to use remote buffer offset
Summary: Refactor AllgatherRing algorithm to remove all memcpy in the communication rounds by using outPtrs as send/receive buffer + remote buffer offset.

Reviewed By: pietern

Differential Revision: D4793186

fbshipit-source-id: 645d0758d246fd0b493e3fe312a8441d86f6d169
2017-04-04 17:08:26 -07:00
04f5b5ea83 Merge commit '5b40e4245d573ae0a6c2da70a0b712528aab2bce' 2017-04-04 15:39:35 -07:00
5b40e4245d Fix typo and make btrisolve work for doubles on the CPU. 2017-04-04 18:29:30 -04:00
39fa092a13 Constant string is generated from Protobuf instead of Thrift
Summary: To make the predictor open souorce, move the constants that are generated from Thrift to Protobuf.

Reviewed By: salexspb

Differential Revision: D4656884

fbshipit-source-id: d4dbb3416e8396185e0981fcd9a090fbb054a18a
2017-04-04 15:03:39 -07:00
ef42d4c2aa Fix sparse to dense and improve DispatchHelper
Summary:
Actually adds stuff on duplicated indices. I didn't use UnorderedSegmentSum because it'd need more modifications for figuring out the first dimension and I don't want to make that function more complex than it's already is :)

We theoretically can have a version that does CopyItems and fails on duplicate indices as a fallback. But I haven't implemented it yet as it wouldn't be that useful for now.

Also fixes hypothesis test - doing rand() inside the body is not cool as it makes hypothesis run forever

Differential Revision: D4814574

fbshipit-source-id: 1851ec5f5df8fc4bf4844585076b8af23a06b0b2
2017-04-04 15:03:39 -07:00
ae5865082c Move common algorithm stuff into algorithm.h
Summary:
Combines the top level common.h with algorithm.h. With algorithm.h in
the common package, CUDA algorithms only need a dependency on that
package. CudaBroadcastOneToAll still depended on broadcast.h so this
change also removes that dependency and has it subclass the Algorithm
class.

Reviewed By: andrewwdye

Differential Revision: D4826885

fbshipit-source-id: 930037e39f7a2c941868e53f0bbc54e3f2e0b184
2017-04-04 13:05:50 -07:00
f86beccc5b Use workspace pattern with CudaAllreduceRingChunked
Summary:
GPUDirect support for CudaAllreduceRingChunked by adding a workspace
template parameter and adding workspace specific init functions.

To support this change the CUDA LocalOp classes had to be changed a
bit to take an extra destination/source pointer. This allows reduction
of 1-N pointers into a target pointer, where the target may live on
device or live on host. If it lives on the host, the NCCL operation
that executes the reduction is followed by a D-to-H memory copy. If
there is only a single input pointer, no reduction needs to happen and
the class just executes the D-to-H memory copy. The net result is that
we can interchangeably use device or host pointers as target for
reduction or source for broadcast and these LocalOp what you would
expect them to do.

Reviewed By: andrewwdye

Differential Revision: D4825236

fbshipit-source-id: 048ec6cbc5a0500bafbe1b3f6abe1e2e5f3a2675
2017-04-04 13:05:50 -07:00
d122b4e4ec Update btrisolve docs to the newest interface. 2017-04-04 15:21:16 -04:00
ccfc4567dc Merge pull request #78 from ilya-biryukov/master
Fix compilation error when compiling with 'clang -x cuda'.
2017-04-04 09:47:52 -07:00
81008aa111 Handle errors in sync IO path.
Summary: Fixes for handling errors and timeouts in blocking and polling sync paths. Add test coverage for errors and timeouts.

Reviewed By: pietern

Differential Revision: D4823498

fbshipit-source-id: 93721947a6404ca9cea6a4869f4156f8d270a981
2017-04-04 09:37:33 -07:00
0cdf10478d Start benchmark element sweep at 100
Summary:
Anything number of elements below this always fits in a single packet
and will yield ~identical results.

Differential Revision: D4825190

fbshipit-source-id: 71ac77456049e991da5059d5a029c5e9d2a67ed7
2017-04-03 23:50:38 -07:00
0e5b2fd016 Support cropping with negative pad sizes in PadImage
Summary: The PadImage op supports cropping along the H/W dimensions by using negative pads; but currently passing negative values for pad attributes throws an error in ConvPoolOpBase, which PadImage inherits from. Modify ConvPoolOpBase to accept negative pad values for non-conv, non-pool ops. Also add a python operator test for cropping

Reviewed By: ajtulloch

Differential Revision: D4817118

fbshipit-source-id: 5ea5203e8072cc34fe14938e534b157d0ad55f6b
2017-04-03 23:47:54 -07:00
4de82cfa0f Use CudaAllreduceRing<CudaDeviceWorkspace> for GPUDirect
Summary:
The existing CudaAllreduceRing with a CudaDeviceWorkspace
template parameter now has the same effect.

Reviewed By: andrewwdye

Differential Revision: D4823393

fbshipit-source-id: 88fe497a983b26a281a3a74fe3bdc02c0c87c523
2017-04-03 20:05:25 -07:00
5c32c82a6d Add option to subtract log odd from sampled trained prediction.
Summary: Useful for sampled softmax training

Differential Revision: D4782673

fbshipit-source-id: 88195de60070a0bc16f5e06b9aad4dffd0484546
2017-04-03 17:50:58 -07:00
1ac8251373 Use gloo::make_unique to fix build for C++11
Summary: Closes https://github.com/facebookincubator/gloo/pull/20

Differential Revision: D4820325

Pulled By: pietern

fbshipit-source-id: 00a870f71e8e98ce6d06da261dcaed83b81ec81c
2017-04-03 17:07:04 -07:00
3b4c950862 Add option to use id_score_list_features column
Summary: Somehow, feed-non-ranking training data usually have this type of column. Add option to support it.

Reviewed By: xianjiec, kennyhorror

Differential Revision: D4773960

fbshipit-source-id: 5a7ef4618a070e04f3cd8ddfcbf2b7441c00d92d
2017-04-03 17:03:09 -07:00
511ca3ea1b Add tests for tcp transport failures
Summary:
Implement a file store for multi-process transport failure testing. Add test cases to spawn multi-process tcp communication, and verify that all processes throw the expected IoException.

A future diff will add coverage for connectivity failures, sync modes, and ibverbs.

Reviewed By: pietern

Differential Revision: D4807794

fbshipit-source-id: 35212719d46e6d875eacb341fae25681f39053bc
2017-04-03 16:08:39 -07:00
8ce1382e99 make it compile on Windows + use ilp64 MKL (#981) 2017-04-03 18:02:15 -04:00
22cdef3ddc recursive halving/doubling allreduce
Summary:
Allreduce using recursive halving and doubling algorithm. Algorithm is described in http://www.mcs.anl.gov/~thakur/papers/ijhpca-coll.pdf (see top diagram on page 12). Algorithm consists of 2 lg P stages, the first log P performing a reduce-scatter and the second log P the allgather. Message size is variable across steps. The early stages of the reduce-scatter and the late stages of allgather send the largest messages. The communication is structured such that the largest messages are sent between nearby ranks, which could be useful if elements are ranked in locality-aware fashion.

So far this supports only power-of-two number of processing elements.

I have attempted to minimize the amount of synchronization/ hand-shaking. Messages are received at different offsets of the output buffer for each communication step. Send offsets in the reduce-scatter steps become receive offsets in the allgather and vice versa. The reuse of buffers across reduce-scatter and allgather steps requires synchronization. Right now the algorithm is inefficient in terms of memory use, requiring 3x memory currently. This can be reduced, but would require additional synchronization.

Reviewed By: pietern

Differential Revision: D4795878

fbshipit-source-id: fcc6597ef6a99cd102fce2b8e4562d93088d39dc
2017-04-03 14:05:44 -07:00
e13e9c1302 cuDNN version of TransposeOp
Summary:
Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version .
+ moves the transpose test under utility_ops, because hypothesis_test is too big

Reviewed By: jamesr66a

Differential Revision: D4810993

fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f
2017-04-03 13:33:10 -07:00
bf28d80460 fp16 support for NCCL ops
Summary:
fp16 dispatch for NCCL
Closes https://github.com/caffe2/caffe2/pull/245

Differential Revision: D4820168

Pulled By: pietern

fbshipit-source-id: 03250a3dfc4439281ef50bb45e7af9c76f6069f4
2017-04-03 13:03:04 -07:00
fe9a243b83 Add default value for GetRepeatedField
Summary:
This is just by analogue with GetSingleArgument which already
has a default_value support

Reviewed By: Yangqing

Differential Revision: D4819789

fbshipit-source-id: cf271d9f345f14f3e373186365726c738c1c26f3
2017-04-03 12:04:22 -07:00
148b11847b Remove useless base class in allreduce.h
Summary:
Didn't provide enough value now that ReductionFunction and
CudaReductionFunction are no longer related.

Reviewed By: andrewwdye

Differential Revision: D4819295

fbshipit-source-id: e6479769af7f78d486bee7d9c31f049430cdc775
2017-04-03 11:09:50 -07:00
b3a2f30715 Extra workspace template parameter for CUDA algorithm
Summary:
To bring the GPUDirect and non-GPUDirect implementations of CUDA aware
algorithms closer together this change introduces CUDA workspaces.
There's an implementation for a host side workspace and a device side
workspace. The former is used for transports that don't support
GPUDirect and the latter for ones that do. CUDA algorithms will take
an extra template parameter for this workspace and this will determine
whether they can be used for GPUDirect or not.

The workspaces only define their respective pointer types right now
but may contain local operation construction functions at a later
point in time.

Reviewed By: andrewwdye

Differential Revision: D4802826

fbshipit-source-id: cb1d71a224ce0165afd07fb9092ad54d3e07c8cf
2017-04-03 11:09:50 -07:00
a95751e918 Fix test_random_seed_behavior for multi-GPU
Summary:
```
E0327 17:33:12.775998 15629 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered
F0327 17:33:12.776208 15629 operator.h:176] Computation on device returned error in operator
output: "Y" name: "" type: "XavierFill" arg { name: "shape" ints: 2 } device_option { device_type: 1 cuda_gpu_id: 0 }
```
Closes https://github.com/caffe2/caffe2/pull/225

Differential Revision: D4819785

Pulled By: Yangqing

fbshipit-source-id: 896ca4d6534643bc261667377cc74d4fd7b3aca3
2017-04-03 10:50:46 -07:00
f29d3839a8 ubuntu installation instuctions for v0.6.0 (#244) 2017-04-03 10:11:21 -07:00
e56f21e46e bump version to 0.6.0 for prerelease (#243) 2017-04-03 10:11:01 -07:00
6490d58a75 Add cuda path to nccl build
Summary:
This was found necessary on some CentOS. aaronmarkham
Closes https://github.com/caffe2/caffe2/pull/240

Differential Revision: D4819591

Pulled By: Yangqing

fbshipit-source-id: 40161cd484a2c8d43f26077919ad2762440dde13
2017-04-03 10:05:26 -07:00
91c4ba7980 Add torch.arange and deprecate torch.range 2017-04-03 10:38:58 -04:00
03f1cab801 Unify argument names in norm and renorm 2017-04-03 10:38:58 -04:00
fa2c566353 Add Variable.type_as 2017-04-03 10:38:58 -04:00
2d1122739c Raise AttributeError in Module.__getattr__ 2017-04-03 10:38:58 -04:00
7861f585fe Reshape grad in dot 2017-04-03 10:38:58 -04:00
9fc56793dd fix trunk for push and small cleanup
Summary:
multiple places broken, blocking the push :(

- fix the weighted training for ads and feeds
- fix the publishing if no exporter model is selected
- fix the feeds retrieval evaluation
- added the default config for retrieval workflows. plan to use for flow test (in next diff)
- clean up not used code
- smaller hash size for faster canary test

Reviewed By: chocjy

Differential Revision: D4817829

fbshipit-source-id: e3d407314268b6487c22b1ee91f158532dda8807
2017-04-02 23:35:49 -07:00
3abf2ef225 Merge pull request #991 from BTNC/win
add /arch:AVX /arch:AVX2 explicitly for msvc so it compiles on windows
2017-04-02 13:32:57 -04:00
70c4b82eba add /arch:AVX /arch:AVX2 explicitly for msvc 2017-04-02 20:47:29 +08:00
b401cb48fe Make optimization methods configurable and allow flexible optimization settings
Summary:
This diff does the followings:

1. Add optimization options to model options in the UI for all workflows.
2. Allow different parameters to use different optimizers (or same optimizer with different settings, eg, learning rate).
3. Remove the default values for the `sparseDedupAggregator` field in the thrift file as the default value for that should just be `None` instead of 'sum'.
4. `fb/dper/layer_models/mlp_sparse.py` is deprecated.
5. Add calibration to two tower workflows.

Reviewed By: kittipatv

Differential Revision: D4767004

fbshipit-source-id: de92ea63fb0ff33f8581b1693479b723a68cd2d1
2017-04-01 23:02:21 -07:00
274b5c9003 Allow unhashable inputs to parallel_apply 2017-04-01 20:11:20 +02:00
b0a0c437dd Some fixes for load/saving and beam search
Summary:
- Fixed loading params into ensemble model
- Small fix for beam decoder

Differential Revision: D4807595

fbshipit-source-id: 0187fda7eb469401f1acd8e6108de54ab67ae922
2017-04-01 02:17:21 -07:00
ce31caf865 batch matmul: guard against old cuda versions.
Summary:
The cublasSgemmStridedBatched is only supported by cuda 8+. Luckily we can
always fall back.

https://devblogs.nvidia.com/parallelforall/cublas-strided-batched-matrix-multiply/

aaronmarkham found this in the centos build on the oss side.

Differential Revision: D4808822

fbshipit-source-id: 1657c139b57158e633074e06787c48302e0df142
2017-03-31 17:32:22 -07:00
2d7731a5d1 Fix typo "mistmatch"
Summary: Closes https://github.com/caffe2/caffe2/pull/239

Differential Revision: D4814359

Pulled By: Yangqing

fbshipit-source-id: 59e959fb97a1d4960626c11242dc9b828b5db25f
2017-03-31 17:06:21 -07:00
254ee9b099 Fix protobuf build to properly include directories
Summary:
aaronmarkham - I think this should fix the oss build.
slayton58 FYI
kmatzen FYI
Closes https://github.com/caffe2/caffe2/pull/238

Differential Revision: D4812550

Pulled By: Yangqing

fbshipit-source-id: 5703e403ef22c02e87f885bad8379fd5a8e06cdb
2017-03-31 13:20:10 -07:00
a2593ea0c2 Add GatherOp for GPU, and update its tests.
Summary:
This is an initial (read: unoptimized) implementation of GatherOp on GPU.
Closes https://github.com/caffe2/caffe2/pull/209

Differential Revision: D4809676

Pulled By: Yangqing

fbshipit-source-id: bc36fa02e9964370ca845e9cc13344e5f3dbf176
2017-03-31 13:20:09 -07:00
dfa2d26830 * make random_ range correct when both lower and upper are specified 2017-03-31 15:37:24 -04:00
559ae078b8 Fix Option constructor in invalid argument error printing code (#1160) 2017-03-31 15:35:35 -04:00
d8c65cc52a A more deterministic way to find old C1 model
Summary: minor fix about C1 model translator

Reviewed By: Yangqing

Differential Revision: D4807165

fbshipit-source-id: 0149e2655d2901b23a37e92f61d9dd678cf6ee69
2017-03-31 11:51:56 -07:00
030ff4928a Merge commit 'a216e377b3844ac9c7882bd391a00f4e0ae718e7' 2017-03-31 11:45:37 -07:00
0829bffdec Merge commit '403cad46dc91a2bc2f6889754055decd6f3d53c7' 2017-03-31 11:45:24 -07:00
ffc7911bec Merge commit 'd8ae7893e056ebf4e7a5e96bab2c3b69f196ddfd' 2017-03-31 11:45:06 -07:00
ff1fde6151 Merge commit 'a3bfb9f376a57fb63e89ddf70f57353f19ed9d69' 2017-03-31 11:44:48 -07:00
a216e377b3 Merge pull request #456 from twitter-forks/addmm-fixes
Using temporary variables when performing transpose + addmm
2017-03-31 14:44:07 -04:00
cd2929c707 ConvTransposeMobileOp respects the shared_buffer arg.
Summary:
This makes ConvTransposeMobileOp inline with other implementations,
allows us to account for these buffers in the workspace, and is generally a good
thing to do.

Differential Revision: D4767431

fbshipit-source-id: b14a96a089136e305ab42680772272f4e5f16f53
2017-03-31 10:32:49 -07:00
8f9cd757db Skips the initialization phase of the individual checkpoint objects.
Summary:
The initialization phase of each checkpoint object simply loads the nanmes of
the blobs in the checkpoints. When we load from the checkpoints, the names of
the blobs are given. We can skip this init step.

Reviewed By: azzolini

Differential Revision: D4808114

fbshipit-source-id: 4c740049c1014f3e93b4b87f43e3937afdefa25a
2017-03-31 10:10:56 -07:00
b13b7010b9 check for nvidia driver's sufficiency before checking for number of CUDA devices (#1156) 2017-03-31 12:19:59 -04:00
a3bfb9f376 THVector_(add),(mul) -> (adds),(mul) for VSX.
This was previously completed for other architectures.
2017-03-31 08:50:23 -07:00
5c79046d39 Use persistent tensor to store exp_inf (part of optimizer's state) (#1152) 2017-03-31 10:30:31 -04:00
5bee34eb84 Add git submodule init command
Summary:
aaronmarkham
Closes https://github.com/caffe2/caffe2/pull/237

Differential Revision: D4808930

Pulled By: Yangqing

fbshipit-source-id: c598fac789e97280d12961b0be257607ebf82244
2017-03-31 00:32:15 -07:00
0771ce312a optimize weighted softmaxwithloss gradient
Summary:
Weighted LabelCrossEntropyGradientKernel had a clowny loop over D. Since the operation is completely linear, we can just do it all in a one parallel loop. Massive speed up: in my benchmark from 4s to 20ms.

+ added weights to the lstm_benchmark

Reviewed By: jamesr66a

Differential Revision: D4800889

fbshipit-source-id: f9850bcc56ce34d5d7a613419cd172256633a894
2017-03-30 23:02:19 -07:00
30fd222b80 implement autograd function cross (#1138) 2017-03-31 01:45:51 -04:00
834142bb64 Change the predictor to use Protobuf
Reviewed By: salexspb

Differential Revision: D4644798

fbshipit-source-id: 0cf96dfc9061f87978a57d2fedcfe4a0bb012405
2017-03-30 22:34:58 -07:00
cd4160c894 distributed training for dper2
Summary:
Add distributed training to dper2 and keep the dper1 working.

* Created a ModelDelegator to wrap ModelHelper and LayerModelHelper to mitigate the difference.
* To get the average length for sparse feature, I extracted some information in feature_processor. There should be some better way to do it after we have new compute_meta.
* metric right now only runs on the first trainer.
* The model is saved correctly for evaluation. But I'm still not sure how to handle the weights for adagrad.

Reviewed By: kennyhorror

Differential Revision: D4767745

fbshipit-source-id: 0559d264827a7fd9327071e8367d1e84a936bea9
2017-03-30 19:04:50 -07:00
8421bf7c60 Faster softmaxWithLoss rowMaxKernel
Summary:
We did not parallelize over D, which can be very large, especially in RNN models. This speeds up significantly, with my quick test in lstm_benchmark and nvprof, the time of RowMaxKernel dropped from 1.2s total to 0.28s total.

+ addded softmaxwithloss to the lstm_benchmark

Reviewed By: jamesr66a

Differential Revision: D4800629

fbshipit-source-id: 3400ea1064b1eb2793bc403df2c1b68801d545e5
2017-03-30 15:49:46 -07:00
1a852095f7 Merge pull request #235 from Yangqing/protobuf
move protobuf back to 3.1.0 due to android/ios cmake error in 3.2.0
2017-03-30 15:29:23 -07:00
3b7b23df66 Move CUDA collectives to cuda_collectives.h
Summary:
The CUDA algorithms all had their own version of local reduction and
broadcast. This commit consolidates them and allows all CUDA
algorithms to work with CudaDevicePointer instances.

Reviewed By: andrewwdye

Differential Revision: D4797968

fbshipit-source-id: cccef39fce01905a2cd757ccbcffd29803411409
2017-03-30 15:06:03 -07:00
ffd1883229 Make extension loader properly handle visibility.
Summary:
(Also, exposed the macros that we use during build time via the macros.h header file)
Closes https://github.com/caffe2/caffe2/pull/233

Differential Revision: D4803311

Pulled By: Yangqing

fbshipit-source-id: 9f8ce57692f81f7a8994344846d3c90aa2c7070a
2017-03-30 14:38:38 -07:00
d933287114 Add a barrier after verification iteration in benchmarks to prevent a race with regular iterations
Summary: Verification was sometimes failing for allreduce halving-doubling. Pieter noticed that it is due to verification step racing with the regular iterations.

Reviewed By: pietern

Differential Revision: D4804558

fbshipit-source-id: f645cb2e332e449a993a634c5bdb42c2dcb8613b
2017-03-30 14:14:32 -07:00
b2c6ac8691 temporarily disable binary build that depends on both leveldb and opencv
Summary: Closes https://github.com/caffe2/caffe2/pull/234

Differential Revision: D4803393

Pulled By: Yangqing

fbshipit-source-id: 56a38346759d4c6547f03c3c24663d114f7db01e
2017-03-30 10:16:55 -07:00
4189882cfe move protobuf back to 3.1.0 due to android/ios cmake error in 3.2.0 2017-03-30 10:09:04 -07:00
ed44e87f98 use striped batch add for the recurrent network gradient
Summary: Instead of callint batch-size many math::Adds, added a new function that does a batch of additions. For CPU there is no difference, but for CUDA we do everything in one kernel. I don't think this has huge performance impact, but at least makes the CUDA profiling look better with less kernel launches.

Reviewed By: jamesr66a

Differential Revision: D4798411

fbshipit-source-id: 44ac65b2da5a615971219809b9298b4e122085cd
2017-03-30 08:57:16 -07:00
761eef1f19 Minor typo fix in backward function in torch/autograd/variable.py (#1143) 2017-03-30 11:23:28 -04:00
5dffba3f92 Sparse momentum update for seq2seq embeddings
Summary: Added SparseMomentumSGDUpdate to NMT training pipeline. Also surfaced and fixed out-of-bounds error in operator stemming from the implicit assumption that gradient slice input would be 2D. Now it is compatible with any dimensions, with indices indexing into the first dimension of param. Added internal checks to ensure that indices are valid.

Differential Revision: D4799697

fbshipit-source-id: 91ea23a6e743cc5337b46fae2821e773067d911e
2017-03-30 08:03:52 -07:00
d8ae7893e0 Get rid of warp-synchronous code (#739)
Time to get rid of warp-synchronous code. It will break!
2017-03-30 01:20:43 -04:00
90b872c670 Add GPUDirect capable version of CudaAllreduceRing
Summary:
This is a copy of CudaAllreduceRing that doesn't stage the locally
reduced buffer in host memory but uses the GPU side buffers directly.

Eventually I would like this to be absorbed back into
CudaAllreduceRing, but for now it's a good place to compare the two
implementations and abstract the parts that make sense, until they are
identical again.

Reviewed By: andrewwdye

Differential Revision: D4791629

fbshipit-source-id: 5ad065cb94adb968aeee2379327be313638f2161
2017-03-29 18:50:11 -07:00
a95ce9e98f Using temporary variables when performing transpose + addmm 2017-03-29 16:56:39 -07:00
0e6413f8ea Fix flaky test
Summary:
Somehow the stress-runs flag does not work as what I expected.
Now the test finally passes.

Reviewed By: azzolini

Differential Revision: D4797559

fbshipit-source-id: 1e46844e9ae55c331c2e265a59dc550983274213
2017-03-29 16:48:20 -07:00
403cad46dc Using temporary variables when performing transpose + addmm 2017-03-29 16:14:13 -07:00
e1d64ea4d5 support multilabel in generic preprocessor
Summary:
Adding support for multilabel in multiclass workflow. `input_feature_schema` and `trainer_extra_schema` are now a function taking in the preprocessor option and output the schema. This allows dynamic schema definition based on the option.

Changing default value will be in the next diff.

Reviewed By: xianjiec

Differential Revision: D4750064

fbshipit-source-id: 896143f432e963bc1723c0153749efeb39a83bec
2017-03-29 15:20:54 -07:00
51ae65d76f RNN: reuse memory for gradients of internal blobs of the cell net
Summary:
Main idea is that on the backward pass we don't need to store all the backward outputs in memory. This diff addresses only ones used internally in each private workspace by creating that shares them all witing the backward pass.

Another thing we can do - get rid of state_grad blobs, but this would be a different effort.

See comments for more detailed description.

Reviewed By: urikz

Differential Revision: D4784900

fbshipit-source-id: 2dd8fe1b1215217ce92c09d918582d76c3051630
2017-03-29 15:20:50 -07:00
619a3ad2f4 Adding use_grad_hack option to Sub gradient
Summary: Similar to how Add gradient handles broadcasting.

Reviewed By: kennyhorror

Differential Revision: D4785565

fbshipit-source-id: ff302c9f1eb16c282c5172a7b9753fdbe68eaf1f
2017-03-29 15:02:36 -07:00
3eb3507367 uniform_sampling layer
Summary: This layer will be used to sample negative labels for sampled softmax.

Differential Revision: D4773444

fbshipit-source-id: 605a979c09d07531293dd9472da9d2fa7439c619
2017-03-29 14:36:12 -07:00
d76a814c93 Fixes for ops without a CUDA backend
Summary:
All of these tests fail with some variant of `Cannot create operator of type 'X' on the device 'CUDA'` (see commit messages).
Closes https://github.com/caffe2/caffe2/pull/227

Differential Revision: D4797060

Pulled By: Yangqing

fbshipit-source-id: 5feaa8e949098bfc1254d4c7449a2744e552f925
2017-03-29 14:36:09 -07:00
b8ccf42c74 Constify algorithm constructors
Summary: TSIA

Reviewed By: gchanan

Differential Revision: D4795492

fbshipit-source-id: aaad7afd373e40fa4669129cf2c98594c4091153
2017-03-29 14:21:03 -07:00
7772f1f182 Make Blob moveable
Summary:
Blob fits well the semantics of a noexcept moveable object since its semantic is equivalent o a unique_ptr.
This allows for example to have a std::vector<Blob>

Reviewed By: pietern

Differential Revision: D4760079

fbshipit-source-id: d652d89be91a90c70651936ff694e0449a2c406b
2017-03-29 14:07:30 -07:00
8aa1cefed8 Fix deadlock in autograd (#1140) 2017-03-29 16:19:40 -04:00
310aacf23c mini-improvements
Summary: 1) allow other than simple network on recurrent net steps;

Reviewed By: urikz

Differential Revision: D4789889

fbshipit-source-id: f30f0e7268a353134db0fe21fc5c456f21fce969
2017-03-29 13:08:47 -07:00
d604961b26 check for ExtractPredictorNet for is_test arguments
Summary: To prevent others making the same mistake as I did, check that no op has is_test=0 argument when ExtractPredictorNet is called.

Reviewed By: viswanathgs

Differential Revision: D4796425

fbshipit-source-id: 38c14df6bcc767ec2e6a6e35ee79596a5dab531c
2017-03-29 12:48:54 -07:00
afe3df32f5 test existence of confu and ninja before installing nnpack.
Summary:
TSIA
Closes https://github.com/caffe2/caffe2/pull/229

Differential Revision: D4795563

Pulled By: aaronmarkham

fbshipit-source-id: 4df871760a1124bb7a2ef226d01b4ced12d21ab1
2017-03-29 10:18:07 -07:00
04210ad531 bump protobuf to 3.2.0 2017-03-29 10:01:22 -07:00
4b147e2079 Settable timeout for tcp read/write
Summary: Add a setTimeout() API to the Pair interface. Implement in the tcp transport for connect, read, and write, and across blocking, polling, and async configurations. Ibverbs implementation to come later.

Reviewed By: pietern

Differential Revision: D4787932

fbshipit-source-id: 6072dc0c0add1700f84a72b83e4388b29b044ec1
2017-03-29 09:07:04 -07:00
92f2220589 Add whitelist capability for smaller mobile binaries
Summary:
This helps adjusting mobile build sizes when necessary.
Closes https://github.com/caffe2/caffe2/pull/228

Differential Revision: D4795135

Pulled By: Yangqing

fbshipit-source-id: 70a0dc35b31d5c8038081aedeb464e47e4284217
2017-03-29 08:47:08 -07:00
8efb762fcd gpu sequence op step 1: clean headers
Summary:
@public

This has no functionality changes yet, only cleaning up the sequence_op file
so that the header is context-independent and I will implement the gpu parts
separately.

Reviewed By: pietern

Differential Revision: D4777140

fbshipit-source-id: 9b4aea6c36f06a64a53e235a125cd3477d54a045
2017-03-29 08:47:00 -07:00
0d908d813b Implements Cumsum function for autograd (#1122) 2017-03-29 17:45:57 +02:00
1c391f6f93 bump version 2017-03-29 10:08:34 -04:00
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
be146fd721 Add btriunpack and update the btrifact test. 2017-03-29 13:42:13 +02:00
7cc92b1260 Add eval net for layer_model_helper
Summary:
This diff is adding eval nets to layer model helper. It should be useful for
the cases when train/eval nets need some extra input (usually some supervision)
for train/eval. For example various sampled layers, etc.

Differential Revision: D4769453

fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb
2017-03-29 04:03:40 -07:00
2979f4b989 add more functions to docs 2017-03-29 01:29:17 -04:00
22b3600f19 add samplers to documentation 2017-03-29 00:33:07 -04:00
95657ea1e8 Protobuf is binary string. Use bytes instead.
Summary: Prepare for the Protobuf change.

Reviewed By: dzhulgakov

Differential Revision: D4784884

fbshipit-source-id: 86219eecefaf7637e70339437c9274c526ebd6fe
2017-03-28 19:03:23 -07:00
215813d7ac Change dockerfile to support for cudnn v6 (#1135) 2017-03-28 20:05:04 -04:00
fd2835887b only resize stepWorkspaces when sequence length increases
Summary:
We should resize the workspace-vector only when it increases. Otherwise we end up destroying and recreating workspaces constantly if sequence length varies.

Modified the lstm_benchmark test to randomize sequence length.

This provides big perf improvement to machine translation pipeline. Look at the recurrent network op runtimes.

WITH:
I0328 12:17:54.073976 492094 prof_dag_net.cc:156]    136.271 ms/iter (   120.987 ms/iter) RecurrentNetwork
I0328 12:17:54.073982 492094 prof_dag_net.cc:156]    190.074 ms/iter (   156.828 ms/iter) RecurrentNetworkGradient

WITHOUT:
I0328 12:25:17.658206 518884 prof_dag_net.cc:156]    375.369 ms/iter (   249.268 ms/iter) RecurrentNetwork
I0328 12:25:17.658211 518884 prof_dag_net.cc:156]    278.892 ms/iter (    227.29 ms/iter) RecurrentNetworkGradient

With LSTM benchmark, get about 2x speedup

Reviewed By: jamesr66a

Differential Revision: D4789354

fbshipit-source-id: ad72f61974e35b0474abcacdc466ae9c6b4eb0ff
2017-03-28 14:08:00 -07:00
a03d956b56 Fixes the flaky test. Although we create nets in three different nodes,
Reviewed By: azzolini

Differential Revision: D4788418

fbshipit-source-id: bdf90c5674b5dbb8b3bda21cf85ea33fedb36fa6
2017-03-28 13:48:07 -07:00
f2b8150a1a Fix PadImage same padding argument.
Summary: PadImage has no kernel parameters resulting pads_ paraemeters to be not set (0). I added a test case too.

Differential Revision: D4785230

fbshipit-source-id: fd475e7c41208e07fa7a363def9a45c6f82cddfe
2017-03-28 13:21:36 -07:00
939daa3d99 gradient checker for nets
Summary: this is useful to test rnn cells

Reviewed By: dzhulgakov

Differential Revision: D4720641

fbshipit-source-id: baa7df43357ed8af72ede64be3e0a642a40472df
2017-03-28 13:03:14 -07:00
1ed746df45 BatchMatMulOp: use cuBLAS batched strided gemm for CUDA
Summary:
Instead of doing gemms in a for-loop (which is not parallelized), it is much better to do the batched matmuls using CUDA 8's new batched-striped version of gemm.

With the MT team's test, we get 5-10% improvement in overall walltime, so it is significant improvement:

----

Without batched gemm:

I0328 10:46:48.118605 58068 prof_dag_net.cc:136]    424.757 ms/iter (   283.878 ms/iter) RecurrentNetwork
I0328 10:46:48.118609 58068 prof_dag_net.cc:136]    352.603 ms/iter (    265.85 ms/iter) RecurrentNetworkGradient

With batched gemm:
I0328 10:53:48.169996 85617 prof_dag_net.cc:136]    407.438 ms/iter (   269.564 ms/iter) RecurrentNetwork
I0328 10:53:48.169999 85617 prof_dag_net.cc:136]    322.393 ms/iter (   287.625 ms/iter) RecurrentNetworkGradient

Reviewed By: jamesr66a

Differential Revision: D4788272

fbshipit-source-id: 210e8b94c1e036b6ef0f039ce000d455258651f4
2017-03-28 11:54:09 -07:00
80e88a88ed Fix ibverbs completion queue capacity
Summary:
The header already contained an analysis of required completion queue
depth but the queue pair was still initialized with a maximum queue
depth of kMaxBuffers. This change fixes that and updates the analysis
to talk separately about receive and send completion queues.

Reviewed By: andrewwdye

Differential Revision: D4785786

fbshipit-source-id: 4dc302d523a3b7162dc261d14cfcc755681febf8
2017-03-28 10:06:50 -07:00
76997e80f2 RNN: remove copy for gradients of recurrent inputs
Summary: See comments in the code

Reviewed By: urikz

Differential Revision: D4771195

fbshipit-source-id: 952b6fe86d49ed1cdb87aba1fda7ac92c67dbeb5
2017-03-28 10:02:25 -07:00
242bff8480 RNN: avoid copy for gradients of inputs to the rnn cell and save more memory!
Summary:
This is pretty tricky to explain, but we can just use
backward_links. This way the whole cell would use a blob from the
states_grad tensor instead of having its own blob. This also should
save on memory a bit

Differential Revision: D4770798

fbshipit-source-id: 673f85b2c2fdf42c47feeaa24d1e2bf086f012f9
2017-03-28 10:02:25 -07:00
327d3cb2b5 Caffe2: add init method and metric logging to data loader
Summary: Caffe2: add init method and metric logging to data loader

Differential Revision: D4685665

fbshipit-source-id: c4e0a09ab6a90c26c329f731f261cba8af1d6bbd
2017-03-28 08:48:27 -07:00
78f0b35949 Caffe2: CUDA implementation for LeakyReluOp
Summary: Caffe2: CUDA implementation for LeakyReluOp

Reviewed By: asaadaldien

Differential Revision: D4782336

fbshipit-source-id: 402eace695307b62740c918660d9e521217e928a
2017-03-28 08:48:25 -07:00
b41449b680 SparseMomentumSGDUpdateOp
Summary: Creates SparseMomentumSGDUpdate, a sparse version of MomentumSGDUpdate, to make that optimization method (via in-place updating operator) compatible with GradientSlices.

Differential Revision: D4784973

fbshipit-source-id: e6330f471a4d5f53589a6ac245e38f256ca7f354
2017-03-28 07:47:46 -07:00
dc7695a47a Update links for tutorials in README (#1123) 2017-03-28 14:21:40 +02:00
032a65edff modify pip uninstall command in CONTRIBUTING.md 2017-03-28 14:20:49 +02:00
9c58341809 codemod: use <> includes for gtest headers
Summary: These are system headers and so should be included via `<>`.

Reviewed By: yfeldblum

Differential Revision: D4783480

fbshipit-source-id: 979670b594859b45560cead34f615442dfcc9f8b
2017-03-28 00:50:54 -07:00
da36212259 SamplingTrain layer
Summary:
`SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema.  When initiated in prediction context, the full output of the wrap layer is produced.

This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context.

I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad.

Reviewed By: xianjiec

Differential Revision: D4689887

fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8
2017-03-27 23:31:55 -07:00
8251653585 caffe2: relax PartitionOp constraints
Summary: We actually copy items inside, so no need to limit this to POD types.

Reviewed By: dzhulgakov

Differential Revision: D4768652

fbshipit-source-id: 98f71b78a7c1dd4a2a2e1bff096d6bf63a0c8f50
2017-03-27 21:02:50 -07:00
ebeb36f6ee Refactoring, t-sne, additional features
Summary:
t-sne projection of instances activations
Minor refactorings

Reviewed By: Mortimerp9

Differential Revision: D4752784

fbshipit-source-id: f5cdb74616ab8e00f9ec362c0b94bcf7988e680e
2017-03-27 20:33:20 -07:00
55546359b6 Retry on EINTR for writev in tcp/pair.cc
Summary: TSIA

Differential Revision: D4783319

fbshipit-source-id: 610d1a65a54048e7c56610632ccfe271eac85b6c
2017-03-27 17:35:45 -07:00
0c47d345df Multi-gpu training for OSS seq2seq
Summary:
Use data_parallel_model for seq2seq multi-gpu training. The main reason for complexity here is that GatherOp hasn't yet been implemented on GPU.

This diff also adds better cliping procedure - clip by global norm rather than by absolute value.

Differential Revision: D4778691

fbshipit-source-id: bff184dae02ecc227413fef51f48a4726e5d3825
2017-03-27 17:32:39 -07:00
db9cae4d34 Allow passing a Deleter to ShareExternalData
Summary: This allow tensors to borrow external buffers and return them once tensor data is reallocated or freed. This is similar to folly::IOBuf's takeOwnership and ZMQ's message constructor taking a deleter as argument.

Reviewed By: dzhulgakov

Differential Revision: D4760188

fbshipit-source-id: 6989678ad66af2e58487173174d5327bd5ae0515
2017-03-27 14:49:43 -07:00
fe3d5a63f2 Support multiple predefined reduction functions
Summary:
Predefining the reduction functions makes it easy to provide a set of
fast implementations. Eigen is used to implement them if it is found.

Reviewed By: andrewwdye

Differential Revision: D4780868

fbshipit-source-id: e825cf2e5cfe8ec27d587c5aff4002534b1c670d
2017-03-27 14:35:02 -07:00
e4b4e515cd add mode to cwrap 2017-03-27 13:29:14 -07:00
4b1f5f4bd6 Merge commit 'afd576ec0e389db3e47efe44652c488b1706f168' 2017-03-27 13:26:50 -07:00
37718e207d Add remote offset argument to buffer send
Summary: This makes it possible to write to any offset in a remote buffer.

Reviewed By: andrewwdye

Differential Revision: D4779776

fbshipit-source-id: f5a44cc705df5141bd720ff4e3fec8697f707a70
2017-03-27 13:07:17 -07:00
afd576ec0e Add mode kernel 2017-03-27 15:58:47 -04:00
95aa2af377 btrisolve: Make a Tensor method and update argument order
Also update docs for btrifact and btrisolve to the newest interface.
2017-03-27 15:46:49 -04:00
6774d39c96 Merge commit '5d274cd4991022d63b014cc8917e00c15441d3f4' 2017-03-27 11:54:08 -07:00
567faedc59 Merge commit '8051dec608368fed3569c7513292785083adc53c' 2017-03-27 11:53:41 -07:00
3ddcff659d Move AddPlan, AddNet, AddBlobs to predictor_py_utils.py
Summary: Cleanup

Reviewed By: salexspb

Differential Revision: D4775061

fbshipit-source-id: b58405729227a6e3fd867d9d5ba959feaa99e5a6
2017-03-27 11:03:22 -07:00
ee28b6ce22 Caffe2: instrument Everstore loader
Summary: Caffe2: instrument Everstore loader and log to Scuba

Differential Revision: D4669060

fbshipit-source-id: 603256e4ba62a32d9aeadc409f83ef9b1f6a7358
2017-03-27 10:02:11 -07:00
7fa4acab9b Loads only the model blobs from the checkpoints.
Summary:
To evaluate from checkpoints, we need to load a model from the checkpoints.
However, the checkpoints store way more blobs than the blobs needed by the
model. This function enables the model builder to load only the blobs
associated with the model to the workspace. After that, the model builder
can evaluate the model from the populated workspace.

Reviewed By: azzolini

Differential Revision: D4751414

fbshipit-source-id: a7a420228d681fc2dcfd8573cf69a97b1abc2ef3
2017-03-27 10:02:11 -07:00
73b18e7ccf Enables checkpointing for dper2.
Reviewed By: azzolini

Differential Revision: D4716571

fbshipit-source-id: c4d71ed676d9465290c2e3fcb26efbbecc72cf72
2017-03-27 10:02:11 -07:00
7c2c7e8e31 Move NCCL code to subdirectory and backfill ops
Summary:
All operations supported by NCCL are now available through the Gloo
wrappers. Algorithm wrappers for them are forthcoming so that they
can be used interchangeably with other implementations.

Since not all of them require same-sized source and destination
pointers, I moved assertions on number of elements to the op
constructors.

Reviewed By: andrewwdye

Differential Revision: D4771292

fbshipit-source-id: 2f34629507b5e1cb9ae8d6d2f02de0a7f641a341
2017-03-27 09:50:40 -07:00
a7029cf34c Add installation configs for header files
Summary:
TSIA
Closes https://github.com/caffe2/caffe2/pull/223

Differential Revision: D4778061

Pulled By: Yangqing

fbshipit-source-id: 36a5b60c6b5d40cf8ca06c0bad490e48ef3f57c8
2017-03-27 08:47:25 -07:00
3eab8a71e2 Added docstring to add_module (#1116) 2017-03-27 11:09:24 -04:00
2fd4d088ff add Adaptive pooling methods to docs 2017-03-26 22:43:46 -04:00
661fa5915d minor bugfix for cmake
Summary: TSIA

Reviewed By: asaadaldien

Differential Revision: D4778069

fbshipit-source-id: 965bd7e00738ed508d5a9b0cae109b73ba1e9b62
2017-03-26 18:46:31 -07:00
2c7f45aa3f update nnpack to the most recent version 2017-03-26 17:29:46 -07:00
5d274cd499 Update btrisolve argument order. 2017-03-26 13:07:24 -04:00
8051dec608 Update btrisolve argument order. 2017-03-26 13:06:34 -04:00
f2c1071c33 Adaptive max and average pooling (1D & 2D) (#1084) 2017-03-26 17:09:28 +02:00
bb71117ecc Cwrap arg assign (#1102) 2017-03-26 13:53:28 +02:00
d25433a099 Fix docker build commands (#1103) 2017-03-25 16:18:33 -04:00
7dd45490f8 don't use inplace backward, remove unnecessary zero for grad_input (#1079) 2017-03-25 20:04:48 +01:00
6163676ebe Skip optimizer when param doesn't have gradient and optimizer is not set
Summary: Currently, we cannot have layer constant because layer params are required to have gradient and optimizer. Global constants don't cut for this because it can only be added once; therefore, a layer that add any global constant can only be used once.

Differential Revision: D4773212

fbshipit-source-id: 5b60d31f3c1602afb04b61f6d30b8e3e06ed2de3
2017-03-24 22:18:34 -07:00
eea0ea7712 Struct nested field name lookup supports List
Summary:
D4690225 added support for nested field name lookup in nested
`schema.Struct`s.  It would throw a KeyError if trying to access a nested
`List`s field.  Writing the lookup recursively avoids the need to enumerate
all complex field types in the lookup.

Differential Revision: D4719755

fbshipit-source-id: 37c87a32d730f0f45f72fb20894da3e32f820999
2017-03-24 18:17:19 -07:00
bf632544e6 Pass NULL rinfo_ to btrifact by default (#1089) 2017-03-24 19:49:40 -04:00
282402d4f3 Revert "Add back zero fill for ger" (#1093)
This reverts commit 5a761dbe65d2221e9c200b3f8ea0590b5d9b923f.
2017-03-24 19:49:31 -04:00
1461709ea0 Improving the performance of IndexLinear:updateOutput
- Removes separate kernel for updateOutputTrain
2017-03-24 16:34:31 -07:00
cce03074f5 Merge commit '3acbbb30f2bdc6ccf4ffb6f7d568e7916d4e384d' 2017-03-24 16:19:44 -07:00
f2f63773d8 Merge commit '52911f9e47f679045a238eb9dfdc5db55bf98cc9' 2017-03-24 16:19:19 -07:00
84aa41824c Merge commit 'b4fe5ad641181f30bdcc4749c949206a3ebb04b4' 2017-03-24 16:19:05 -07:00
25c8a117af Merge commit 'e8196f990db4ba368010f0d950bebf1fb13c2888' 2017-03-24 16:18:52 -07:00
6aee34b666 Registering GPU version of PackSegments using GPUFallbackOp
Summary: Creating PackSegments and UnpackSegments GPU operators using GPUFallbackOp for now. The op does mainly copying of blobs and this is a reasonable solution until we have a CUDA op.

Reviewed By: pietern

Differential Revision: D4761589

fbshipit-source-id: dd483b9e34ecb6b53925405e5b4c24859c549606
2017-03-24 16:01:53 -07:00
ae122707b5 Don't do extra resize in linear bias 2017-03-24 23:41:15 +01:00
8ce34d6c87 Add Calibration
Summary: Add calibration to sparse_nn

Differential Revision: D4735564

fbshipit-source-id: 6baa637cbffcbbd50134a256d622ef8c962fca3b
2017-03-24 14:32:23 -07:00
b711c7d039 More perf stats for BlobsQueue
Summary: Allow to drill down on data throuhgput overall and per field.

Reviewed By: dzhulgakov

Differential Revision: D4622168

fbshipit-source-id: 1462bb2fac05824fda0c02f4f5f0b8713893e650
2017-03-24 14:03:28 -07:00
2f73a01b70 Average and time spent counters
Summary:
- Allow to capture averageable stats such as bytes and time per request
- Allow to capture time ellapsed.

Reviewed By: pietern

Differential Revision: D4622101

fbshipit-source-id: f08e422ecdfda83b13a4ed8ab9c6d5c2a5d5718d
2017-03-24 13:34:27 -07:00
29c1102806 Extract net and blobs assignment to separate functions
Summary:
Use AddNet and AddBlobs to add net and blobs to meta_net_def.
This a codemod and does not change the functionality.
It is for preparation of the protobuf change.
Depends on: D4770648

Reviewed By: salexspb

Differential Revision: D4771110

fbshipit-source-id: 00cecb2105f2c332bd50c3c51b9a10e1004fa90f
2017-03-24 13:17:24 -07:00
b4fe5ad641 Use zero instead of mul when beta == 0 in addr 2017-03-24 13:09:00 -07:00
5a761dbe65 Add back zero fill for ger
Ger does not have beta argument, so has to be zero-filled.
2017-03-24 21:03:02 +01:00
e591ddb70b Add nnpack specific dependencies under third_party 2017-03-24 12:37:56 -07:00
0ade0578b1 Reset workspace after each test in copy_ops_test
Summary:
This was a nasty one to track down. This was the error message:
```
E0323 14:47:46.138900  2870 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered
F0323 14:47:46.139143  2870 operator.h:176] Computation on device returned error in operator
input: "x_gpu_2" output: "loss" name: "" type: "AveragedLoss" device_option { device_type: 1 cuda_gpu_id: 1 }
```
Closes https://github.com/caffe2/caffe2/pull/220

Differential Revision: D4771086

Pulled By: Yangqing

fbshipit-source-id: f2d0f39f1647c84d97d9745f8a0305a389bfbc41
2017-03-24 12:20:34 -07:00
ad8b92b9e8 Extract plans assignment to AddPlan function
Summary:
Codemod to use a separate function, for protobuf change later on
It does not change the functionality

Reviewed By: salexspb

Differential Revision: D4770648

fbshipit-source-id: d8090f45d31ffa5ca1dca47297fb7c196f34d8a6
2017-03-24 12:02:49 -07:00
dd893391d5 Add argument to children to yield the name of the modules (#941) 2017-03-24 20:02:05 +01:00
649f04d077 Added Pascal nvcc flags, bumped version 2017-03-24 11:58:14 -07:00
463a28afcb Windows build for easier python usage
Summary:
Changed the windows python extension name to ".pyd" and did a manual copy from the {Debug,Release} folder to the main folder for easier automatic build.
Closes https://github.com/caffe2/caffe2/pull/222

Differential Revision: D4771065

Pulled By: Yangqing

fbshipit-source-id: 4a89d409fa66f0979cf4ecf502189b2f9cc11504
2017-03-24 11:33:27 -07:00
f45ef5fdb8 AllGather algorithm [CPU]
Summary: Allgather ring CPU implementation. Its does |buffers| x |contextSize| passes.

Reviewed By: pietern

Differential Revision: D4723809

fbshipit-source-id: ffd8366ac7e1746555474e173143d33cee497822
2017-03-24 11:06:57 -07:00
e8196f990d Make rinfo_ argument optional in btrifact 2017-03-24 09:01:36 -07:00
269b77a1b2 Make rinfo_ optional in btrifact 2017-03-24 09:00:39 -07:00
51a92c6659 Add gloo submodule 2017-03-24 15:34:50 +00:00
476d85dd3f DataLoader: Fix batch data type for numpy array (#1074) 2017-03-24 11:34:24 -04:00
d5880b128e CMake support for Gloo dependency
Summary:
This also requires a change to cmake/External/nccl.cmake to use the
static NCCL binary instead of the shared object. When the Caffe2/Gloo
build uses the bundled NCCL version it should be packaged up in the
resulting libraries and not cause another runtime dependency on a
library that has to be installed separately.
Closes https://github.com/caffe2/caffe2/pull/218

Differential Revision: D4769926

Pulled By: pietern

fbshipit-source-id: 5c85559992c200d874f4218724823815ffb5adb5
2017-03-24 08:32:24 -07:00
63f6c0d692 add Pairwise distance (#835) 2017-03-24 11:29:40 -04:00
b546fa3fcd add assertTrue to padding tests 2017-03-24 15:27:51 +01:00
1d656b6769 Ensure displayed progress in ProgressMonitor is between 0 and 100%.
Fixes #1086
2017-03-24 15:21:52 +01:00
97a6400f03 Don't do copy for param_grad in backward_step_net
Summary: We anyway accumulate values of this blob (param_grad) in a another special internal blob

Differential Revision: D4768643

fbshipit-source-id: a9d08b7eafd25f278a8db722f9cdb1d0064b852a
2017-03-24 02:22:33 -07:00
99bfd36a04 CRF layer in caffe2
Summary:
This is implementation of a CRF layer in caffe2 according to this paper: https://arxiv.org/abs/1603.01360
Currently this implementation works only for batch_size = 1

Reference implementations:

- Tensorflow:
 63a21e0540/tensorflow/contrib/crf/python/ops/crf.py

- Theano:
https://github.com/glample/tagger/blob/master/model.py#L286

Differential Revision: D4644004

fbshipit-source-id: bf0801fd8562d11dca3fefe371c3d85e1dd69ccc
2017-03-23 22:02:02 -07:00
3acbbb30f2 Fix inconsistent in-place and out-of-place for HardTanh
in-place and out-of-place updateGradOutput results are different where input=min_val or input=max_val
2017-03-23 17:27:29 -07:00
52911f9e47 Fix inconsistent in-place and out-of-place implementations
Currently in-place and out-of-place updateGradOutput will produce different results for input=max_val or input=min_val - in-place won't backprop gradient where input=max_val or input=min_val, out-of-place will backprop gradient in this case.
2017-03-23 17:22:55 -07:00
a65e0f488c Remove zero fill where not needed (#1077) 2017-03-23 19:44:00 -04:00
396ebb0546 exec_net --> predict_net
Summary: Change the naming convention back for maintainability.

Reviewed By: Yangqing

Differential Revision: D4741875

fbshipit-source-id: 044051e772383e81812ae7064a921e97d63615dc
2017-03-23 16:31:49 -07:00
8dc5d2a22e export current_blas_handle 2017-03-23 23:32:45 +01:00
2cb123df83 Fixed list init issue under MSVC compliation
Summary: Closes https://github.com/caffe2/caffe2/pull/216

Differential Revision: D4763418

Pulled By: Yangqing

fbshipit-source-id: 85148720388e407c9a0f9660ef4822048837de14
2017-03-23 15:17:49 -07:00
422c65ca35 Removing unnecessary Copy after fixing gradients for external parameters
Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net

Reviewed By: salexspb

Differential Revision: D4752259

fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450
2017-03-23 15:04:22 -07:00
ed97f3f854 Adding support for flattened inputs for IndexLinear
- Adding relevant tests
2017-03-23 14:18:41 -07:00
a231fe8fc5 IndexLinear support for cunn 2017-03-23 14:18:01 -07:00
8168e8ac25 allows to specify output names for functional layers
Summary:
currently the output schema and blobs are names as "field_i" which is
bad for debugging. This diff allows us to specify output names.

Reviewed By: kennyhorror

Differential Revision: D4744949

fbshipit-source-id: 8ac4d3c75cacbb4c9b5f55793ac969fe1cf20467
2017-03-23 13:18:58 -07:00
0bd69b20d7 ReluGradientOp implementation with Eigen
Summary: Closes https://github.com/caffe2/caffe2/pull/217

Differential Revision: D4764348

Pulled By: Yangqing

fbshipit-source-id: b7053a085650160e221293d528f553cc402ff10b
2017-03-23 13:18:57 -07:00
bb353ccc17 Add batch triangular factorization and solves, add IntegerTensor to cwrap (#903) 2017-03-23 15:06:00 -04:00
ced0054a9e Fix formula for stddevs grad in Normal function (#1076) 2017-03-23 14:32:34 -04:00
68ee5ede29 make inplace tests compare input grads 2017-03-23 18:54:00 +01:00
2966e3295d Make static/shared configurable and install optional
Summary:
This makes it possible to embed Gloo in a project without CMake
installing Gloo headers and/or libraries, or having a runtime
dependency (and statically link to it).

Also:
* Install benchmark tools
* Statically link to NCCL if the bundled version is used
Closes https://github.com/facebookincubator/gloo/pull/19

Differential Revision: D4762432

Pulled By: pietern

fbshipit-source-id: cf38903e6c51f2480fba4ff18cbdc0c9080df0c4
2017-03-23 09:06:37 -07:00
4df98e2927 Merge commit '3865606299b1fbcd0a94cef4a66c1bc007246da8' 2017-03-23 08:39:43 -07:00
6ccac5ce28 Merge commit 'd3334db6274d7a3cd07f20d583056e453dc8134d' 2017-03-23 08:39:30 -07:00
3865606299 adding batch triangular factorization and solves, add IntegerTensor to cwrap 2017-03-23 11:37:00 -04:00
d3334db627 adding batch triangular factorization and solves, add IntegerTensor to cwrap 2017-03-23 11:35:35 -04:00
50f5a4dd18 fix BCE loss formula visualization (#1072) 2017-03-23 11:27:21 -04:00
b60936b9ae fix NLLLoss2d documentation 2017-03-23 10:06:40 -04:00
2d750b9da5 fix typo 2017-03-23 09:40:06 -04:00
ca376d4584 implement autograd function trace 2017-03-23 10:37:52 +01:00
ef183a1d23 Merge commit '5cd313ed23a3b11ddd739bcfedaee6e310e4e438' 2017-03-22 19:25:46 -07:00
f4d8944973 fix OSX fread bug (#1068) 2017-03-22 22:06:14 -04:00
42036871e9 Fix windows build
Summary: Closes https://github.com/caffe2/caffe2/pull/214

Differential Revision: D4755224

Pulled By: asaadaldien

fbshipit-source-id: 8a3c6d13319aecc0bf700bad2b3e9ed2a53571e9
2017-03-22 19:01:36 -07:00
d76e460b80 Allow to query the blob size in bytes for perf stats
Summary: This allows to gather stats on how much raw and compressed data is being transferred across queues and network.

Reviewed By: dzhulgakov

Differential Revision: D4622049

fbshipit-source-id: 27c0c0df9e5a705f91256b20a29c7f8f988085da
2017-03-22 18:09:55 -07:00
6b7aef63ac Added support for multidimensional tensors in PReLU; Channel number now in second dimension 2017-03-22 20:36:52 -04:00
b3ab4b1094 Check torch.backends.cudnn.enabled, padding, and output_padding (#996)
* Check torch.backends.cudnn.enabled
* Don't allow negative padding and output_padding values
2017-03-22 19:42:11 -04:00
1e8cb82a2d Break only after the update in L-BFGS 2017-03-22 18:58:42 -04:00
dd399a8d68 Return total param norm from clip_grad_norm 2017-03-22 18:58:42 -04:00
faac0f5c25 Fix torch.cat bugs
Always use PySequence API and disallow catting along inexistent
dimensions.
2017-03-22 18:58:42 -04:00
c36f47bd1e Make random_ exclusive and make generator kwarg only in all random
functions
2017-03-22 18:58:42 -04:00
3d1888cd95 Fix size mismatch in CosineEmbeddingLoss backward 2017-03-22 18:58:42 -04:00
3b7cb50d1c Add ConvNd to model helper
Summary:
Add ConvNd interface for Nd  convluton and keep Conv for 2d convlution.
I added _BaseConv to share code between ConvNd and Conv.

Reviewed By: Yangqing

Differential Revision: D4660822

fbshipit-source-id: 8339421351ce9a36ce5a165f7fa455cfcc61733d
2017-03-22 15:47:48 -07:00
0276c992b7 translator fix
Summary:
This completes the fix that viswanathgs started in an earlier diff but did not
cover the full Caffe convention. It should have proper guards for all the stuff
that Caffe implies, either supporting it or throwing an explicit exception.

Reviewed By: viswanathgs

Differential Revision: D4751751

fbshipit-source-id: 474e921c33840cff333a631b7b19f881b39ebccd
2017-03-22 15:09:13 -07:00
e4907bd1ba Improving exception logging in Caffe2
Summary: Changed logging so stack trace always comes last.

Reviewed By: dzhulgakov

Differential Revision: D4749720

fbshipit-source-id: 5c8bb1b6087cb5db2e91606a5b0cb40c783bf909
2017-03-22 15:09:13 -07:00
97a82a3018 fix formatting in upsampling docs (#1067) 2017-03-22 18:06:31 -04:00
5cd313ed23 Fix TH_TENSOR_APPLYX_D in the case where the dimension of interest is the inner dimension 2017-03-22 13:15:01 -07:00
b414494035 Merge commit '714b2b8bf657afe41cc8503998b6d919339b8075' 2017-03-22 12:49:29 -07:00
c10efc646e Merge commit 'e17d84d38edf6094175deead555abbc96321b69f' 2017-03-22 12:49:11 -07:00
348531ad8d Merge commit '0056b0883426e38ffbd646c040b6c281d12673f2' 2017-03-22 12:48:57 -07:00
9d83121ef5 Don't add options to CUDA_NVCC_FLAGS if already set
Summary:
This may be the case when the Gloo CMake files are sources from a
parent project that has already imported CMake CUDA support. If these
checks are not performed then CUDA_NVCC_FLAGS might contain
conflicting options.

Verified this works while working on Gloo for Caffe2.
Closes https://github.com/facebookincubator/gloo/pull/18

Differential Revision: D4756179

Pulled By: pietern

fbshipit-source-id: 32fc39ec2322cce5899a2398ebbf8395d3917502
2017-03-22 12:35:04 -07:00
9ab65d7be0 Add CUDA profiling ops
Summary:
These new ops allow you to initialize, start, and stop the CUDA
profiler. This makes it possible to profile CUDA code without running
the application through nvprof.

Reviewed By: jamesr66a

Differential Revision: D4747863

fbshipit-source-id: b439e8f28d1d62db19524fee0458523414cb79e3
2017-03-22 09:37:35 -07:00
6d7cb31e53 MPI: Duplicate MPI_Comm and allreduce maxLength as MPI_ UNSIGNED_LONG.
Summary:
Some small MPI-related changes:
1) Instead of making an object copy of the MPI_Comm, call MPI_Comm_dup;
because the (passed-in) communicator is used later via the call to
connectFullMesh this guarantees that the communicator will not have been
freed by user before connectFullMesh is called.

2) Allreduce for maxLength is done on an unsigned long type; use the
corresponding MPI type.
Closes https://github.com/facebookincubator/gloo/pull/17

Differential Revision: D4754195

Pulled By: pietern

fbshipit-source-id: 863fd33c726f88120f8f5ee61964c3525babbf97
2017-03-22 09:26:00 -07:00
30a9cf7a46 Mark transport pair after IO error and propagate to calling threads
Summary:
This change solidifies IO error handling between threads and successive transport API calls. When an IO exception occurs, signal all buffers of the error, propagating the exception from the device thread or single user thread onto all user threads. Store the exception in the pair and check on future API calls or device events. Swallow all IO exceptions in the device loop.

Right now IO exceptions during portions of the listen/connect phase will result in an indefinite wait in the peer. I will address this with a configurable timeout (t16205269).

Reviewed By: pietern

Differential Revision: D4749248

fbshipit-source-id: c75ee3b20875d561bf84631e5384e28015dabad3
2017-03-22 09:06:24 -07:00
714b2b8bf6 Merge pull request #453 from apaszke/lookup_renorm
Cast accumulator in LookupTable renorm to accreal
2017-03-22 11:53:41 -04:00
fe4bd5066b Added support for multidimensional tensors in PReLU; Channel number now in second dimension 2017-03-22 11:45:02 -04:00
e17d84d38e Added support for multidimensional tensors in PReLU; Channel number now in second dimension 2017-03-22 11:44:28 -04:00
b9aef6bc03 Fixing default values for LR and Epsilon (#895)
It seems that the default values for LR and Epsilon (previously, 1E-2 and 1E-38 respectively) were different from the ones recommended by the authors (2E-3 and 1E-8, respectively). Other packages such as Keras (https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L474) and Lasagne (https://github.com/Lasagne/Lasagne/blob/master/lasagne/updates.py#L612) use the suggested values as well.
2017-03-22 11:34:39 -04:00
0056b08834 Narrow V when returning only some right singular vectors 2017-03-22 08:33:03 -07:00
bd0df61bb5 Cast accumulator in LookupTable renorm to accreal 2017-03-22 08:29:39 -07:00
d9678c2e34 Correct typo in batchnorm documentation 2017-03-22 13:55:45 +01:00
ea66516d5e Output attention weights from apply_xxx_attention methods
Summary: OSS diff. We need it later for beam decoding.

Differential Revision: D4747785

fbshipit-source-id: ce2d53ee2434216ace3c4ddbd40a9b68e9db7ec5
2017-03-21 19:01:58 -07:00
d7b2aebf2c Support for Sum in cell net as first operator
Summary: This didn't work for a reason specified in comments. Also some cleanup in the unit tests, now inference uses a custom workspace to run cell net on

Reviewed By: urikz

Differential Revision: D4742670

fbshipit-source-id: 04165c029fddec5ae31b20b207faf06d2fa20816
2017-03-21 18:32:18 -07:00
e3fc195fc6 fix mklmemory bug
Summary: This popped up during the debugging with intel folks.

Reviewed By: salexspb

Differential Revision: D4745176

fbshipit-source-id: 88ce91e565b45253d60588ab35ed4b8e5b8d4947
2017-03-21 18:04:24 -07:00
2417725ae5 Use version-specific CUDA packages 16.04
Summary:
So you can just run `BUILD_CUDA=ON .travis/install.sh` on a 16.04 machine and have it install the right packages.
Closes https://github.com/caffe2/caffe2/pull/212

Differential Revision: D4748670

Pulled By: Yangqing

fbshipit-source-id: 2015613e4d5ca6bcd1c9320c6c4cba071463c120
2017-03-21 13:47:07 -07:00
b3c0aa3b7d fix a typo in ffi doc (#1055) 2017-03-21 15:37:48 -05:00
8fc9c79287 Add nccl submodule 2017-03-21 17:53:58 +00:00
4fce1a389f Include CUDA support in CMake build
Summary:
* Pull in NCCL submodule
* Include (heavily modified) CUDA/NCCL build files from [Caffe2](https://github.com/caffe2/caffe2)
* Build CUDA enabled benchmark/test
* Enable CUDA build in Travis configuration
Closes https://github.com/facebookincubator/gloo/pull/16

Differential Revision: D4746784

Pulled By: pietern

fbshipit-source-id: b5c6cbcd8ac8b30c071851cdc7ae88c69c0ab4d6
2017-03-21 10:51:57 -07:00
8a35fea9eb Improve error message for not found operator
Summary: Seems like a lot of confusion in the group lately has been about missing CUDA operators. Let's make it clearer in the error message.

Reviewed By: azzolini

Differential Revision: D4737037

fbshipit-source-id: 56c7819df909bf954510296703bff5f221fa8ae7
2017-03-21 10:35:00 -07:00
aa4d07d3c4 bugfix for Windows, esp. VS 2017
Summary:
aaronmarkham this solves your Windows build issue. Basically:

(1) VS 2017 does not have CUDA support yet, and we will be waiting on NVidia to do so.

(2) VS 2015 and 2017 need different cmake generator strings.

This PR shows how to determine those and also updates appveyor to do contbuild guard for the following 3 settings:
- VS2015 without cuda
- VS2017 without cuda
- VS2015 with cuda
Closes https://github.com/caffe2/caffe2/pull/210

Differential Revision: D4745007

Pulled By: Yangqing

fbshipit-source-id: 50952552843abd0eb6f4145d9f132daeee3a6794
2017-03-21 05:17:59 -07:00
93ff338ca7 Beam decoder for NMT in Caffe2
Summary: yolo5

Differential Revision: D4685076

fbshipit-source-id: b5534e441bb453f90e5210294f2dfff6b5c3b5b1
2017-03-20 22:03:59 -07:00
d13f98de4e implemented DistillLRLoss
Summary: Created `BatchDistillLRLoss` layer and added support for it in DPer2.

Differential Revision: D4718333

fbshipit-source-id: b873954ea704daafed94ac65fef47a20d56858e2
2017-03-20 16:01:29 -07:00
e41d35909a Conv-ND NCHW CUP/CUDA implementation
Summary: Migrate caffe1 ConvNd implementation to caffe2.

Reviewed By: Yangqing

Differential Revision: D4659868

fbshipit-source-id: 14b178af3faa2c0b12e5a9f7aa76c1d8945419ea
2017-03-20 14:01:07 -07:00
8ce56c30d4 Convert runtime errors to gloo exceptions
Summary:
Bubble up gloo configuration and network errors as exceptions. The caller may be able to recover. Other unexpected failures continue to be handled as fatal with GLOO_ENFORCE

Modify ibverb API validation to check for != 0 instead of -1 to conform with API definition.

Still need to convert some errors in the rendezvous code and add documentation.

Will pass device loop errors onto the calling thread in a future diff

Reviewed By: pietern

Differential Revision: D4730362

fbshipit-source-id: c801adb353013e7f541ab01ac16a0cc71c1c36b2
2017-03-20 13:50:29 -07:00
771d169c7c Extend conv params to handel nd inputs
Summary: Extend ConvOp parameters to handel ND convlution input parameters.

Differential Revision: D4659838

fbshipit-source-id: 920f40dd80acfd03e04fcc04221209302232906d
2017-03-20 13:18:39 -07:00
4667f936e3 Add explicit dependency on pthreads
Summary:
Got linker errors on Ubuntu 16.04 (not on 14.04).
Adding the pthreads dependency explicitly fixes it.
Closes https://github.com/facebookincubator/gloo/pull/15

Differential Revision: D4739081

Pulled By: pietern

fbshipit-source-id: 6bae7d361d934e93560d28a76c3dca4a4236f113
2017-03-20 11:52:41 -07:00
4eaa30b634 Build tweaks
Summary:
* Mention submodules in README
* Remove fetch.sh from third-party directory
* Rename benchmark/test build targets
Closes https://github.com/facebookincubator/gloo/pull/14

Differential Revision: D4739077

Pulled By: pietern

fbshipit-source-id: 859c1cac0c0163870eae8f18e4e2f177a6bc8890
2017-03-20 11:35:19 -07:00
33f41c06c0 Remove more instances of batch_size
Summary: D4734505 part 2. Remove more instances of the batch_size parameter

Reviewed By: urikz

Differential Revision: D4736906

fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf
2017-03-19 22:31:30 -07:00
17da5856ed Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4734505

fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07
2017-03-19 18:16:28 -07:00
3924a35509 RNN: recycle workspace (attempt 2, easy mode)
Summary: this is a bit simple version of what Aapo did before. As that one has some weird crashes in some of the training pipelines.

Reviewed By: urikz

Differential Revision: D4734934

fbshipit-source-id: f9ecff2a0d68a8cbc0858658f38be34d616fa100
2017-03-18 22:31:39 -07:00
3e222d501a Backed out changeset 460028d912d6
Reviewed By: urikz

Differential Revision: D4734926

fbshipit-source-id: c3ba01b70c7f515e1580a8f9a5e6d3ecff1d9f47
2017-03-18 22:31:39 -07:00
25bbd632e3 Backed out changeset 35c70e825855
Reviewed By: urikz

Differential Revision: D4734923

fbshipit-source-id: 0d460b8460aef510ce4f18fdaaeaedebe1324608
2017-03-18 22:31:39 -07:00
d1424c3265 Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601

Differential Revision: D4702086

fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b
2017-03-17 17:36:47 -07:00
f97d7949d0 Remove legacy LSTM, cleanup tests
Summary: we don't use this one any more except a few tests

Reviewed By: urikz

Differential Revision: D4731401

fbshipit-source-id: c5c28b7594e3251f501fc28455dfc9bd2093a836
2017-03-17 16:33:53 -07:00
77fbc12f23 Fix some deadlocks when torch_shm_manager is not found (#1030)
- Add additional timeouts to test_multiprocessing to reduce chances of
   hanging indefintely on failure
 - Add missing header guards
 - Fix typo
 - Check that torch_shm_manager exists in torch/__init__.py
2017-03-17 18:28:39 -04:00
7e46eb1613 Fixes for Prod and Expand functions (#1026)
Thanks to @ChangYong-Oh for the original implementation.
2017-03-17 18:24:44 -04:00
1aa5231fb3 make nnpack build on mac/linux, and also contbuild support
Summary:
* add custom ninja install

* minimal build for nnpack

* force -fPIC for nnpack
Closes https://github.com/caffe2/caffe2/pull/207

Differential Revision: D4729265

Pulled By: Yangqing

fbshipit-source-id: 2ed345a4fda6b4811af03cd1898e2402dda58701
2017-03-17 15:19:07 -07:00
a2fc88cf97 Remove fbcollective from tree
Summary: This has been subsumed by gloo.

Reviewed By: andrewwdye

Differential Revision: D4729216

fbshipit-source-id: aa4f0637ee70dd03e85a6a0e7ffda68e5e9505be
2017-03-17 10:19:06 -07:00
a15776c868 Fix for Windows build
Summary:
suppressed warning, added noexcept to destructors, and fixed an inclusion bug introduced today in the top_k diff.
Closes https://github.com/caffe2/caffe2/pull/206

Differential Revision: D4729263

Pulled By: Yangqing

fbshipit-source-id: 20166382f1e3547713f7d554a151a5387f0a41c1
2017-03-17 10:19:06 -07:00
4829bdb1ea BatchSoftmaxLoss layer
Summary: Similar to BatchLRLoss layer

Reviewed By: xianjiec

Differential Revision: D4689609

fbshipit-source-id: 89fa4b9d4145ce77cb2aaa7a5c0c1a24f901d88f
2017-03-17 10:19:06 -07:00
cea16ff7cd BatchSigmoidCrossEntropyLoss
Summary: To support feed interset team

Reviewed By: kdub0

Differential Revision: D4719213

fbshipit-source-id: 8deb3544377fb06593399b101de66f3f845f93b5
2017-03-17 09:35:51 -07:00
c3973f08a5 Check that inputs/outputs don't change between runs
Summary:
This can happen when the tensors are changed/resized. The cached
algorithm instance won't be valid in that case. I think for now it's
best to fail hard and require the net to be reinitialized if this
happens. If instead we would always reinitialize this condition is
detected then frequent resets could lead to poor performance and go
undetected.

I spoke about the generality of this problem with YQ. The pattern used
here of updating a representation of the op's parameters is far from
ideal. Instead, it would be much better to have the core framework use
some kind of versioning on tensors/blobs (can be as simple as a single
integer) to make it much easier to detect a change in inputs/outputs.
If there are more places that would benefit from such a facility, we
should consider adding it. As right now Gloo is the only place where
we need it, it doesn't make sense to immediately add it to core.

Reviewed By: Yangqing

Differential Revision: D4728121

fbshipit-source-id: 69a8a620aecc961a3f7a27e8c53e22945d9a258e
2017-03-17 09:04:04 -07:00
79c3a3af54 add gpu support for caffe2-seq2seq
Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU.

Reviewed By: urikz

Differential Revision: D4631914

fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441
2017-03-17 05:19:14 -07:00
821656d2d8 add CONTRIBUTING document 2017-03-17 07:59:37 -04:00
86e40ed875 Fix a typo in docs about pinned memory buffers (#1023)
* remove misleading guide for BCELoss

* fix docs about pinned memory buffers
2017-03-17 05:08:03 -04:00
1513b1de6b Add ResizeNearest operator
Summary: This adds a nearest neighbor interpolation resizing operator to caffe2. CPU only, NCHW only, no gradients. Also adds torch2caffe support. This is probably not optimal in terms of performance, but it works.

Reviewed By: ajtulloch

Differential Revision: D4724244

fbshipit-source-id: b8295061141fb513da84acf91fdfd67264119059
2017-03-16 18:49:01 -07:00
ad4ae4528f migrate mtml to dper2
Summary:
1. migrate the basic mtml model to dper 2
2. test dper 2 mtml model
3. test all optimizers

Reviewed By: kittipatv

Differential Revision: D4680215

fbshipit-source-id: 7aac5c59bdac22fcad8ed869b98e9e62dca1d337
2017-03-16 17:48:05 -07:00
cc2e915461 Implement TopK op in caffe2
Reviewed By: salexspb, urikz

Differential Revision: D4718439

fbshipit-source-id: e6866eb7bb586f2716662cd4b65961bdd9914525
2017-03-16 17:32:20 -07:00
2c8bf2525b added BatchL2Loss layer
Summary: layer that takes a label, prediction pair and outputs the L2 loss

Reviewed By: kittipatv

Differential Revision: D4702111

fbshipit-source-id: 09f2ede44d1b548e61096de741f1b2aa0b66bbcb
2017-03-16 17:32:20 -07:00
9382ecb9cd Set up Caffe2 versioning number
Summary:
Setting up a caffe2 versioning number per popular request.

The plan is to periodically update the version, with the current plan being
every other week. As a result I am setting the initial number to minor version
5 (since this is the 11th week of the year).

Reviewed By: salexspb

Differential Revision: D4725945

fbshipit-source-id: 9ff4c7e4a6341e22a5f1d4e25740705988cae84b
2017-03-16 17:32:20 -07:00
227fd0bbc7 fix bypassing_mtml crash
Summary:
Currently if all samples in a batch miss labels, the task customized layers have no data.
In that case, the EnsureDense op does not compute the gradient correctly. To avoid that, we switch
back to let Gather to generate dense gradients.

why EnsureDense op does not compute the gradient correctly?
It is because when EnsureDense computes gradients, it does not know the actual data batch size. So its out gradients may have wrong batch size.

Reviewed By: xianjiec

Differential Revision: D4712463

fbshipit-source-id: 736f63273e7fbc4348f37fa3a5a696f855b7c3ad
2017-03-16 16:08:43 -07:00
1d0699e147 Define exception hierarchy
Summary: Define an exception hierarchy for gloo runtime errors. Keep GLOO_ENFORCE macros for assertions.

Reviewed By: pietern

Differential Revision: D4724124

fbshipit-source-id: 22f0581b06524579e86fe335770bdb620d20e258
2017-03-16 15:08:01 -07:00
ea52c7567a Expose minSize for threadpool
Summary: Useful for restoring after a conditional block where we want to disable threading.

Reviewed By: jamorton

Differential Revision: D4638648

fbshipit-source-id: 29695284f7c427caa6b80a9bca0cbd1406543a44
2017-03-16 14:47:25 -07:00
d85ed5d5d6 fix external_loggers
Summary:
it was broken in trunk and I fixed it locally then had a
wrong merge in D4672026.  This is just a revert of those changes

Reviewed By: ajtulloch

Differential Revision: D4723138

fbshipit-source-id: 14757d9c8ae5135bd7c084003a64e25efc74b54f
2017-03-16 13:47:58 -07:00
b9379cfab7 Use cuDNN and NCCL symbols from _C library (#1017)
This ensures that we use the same library at the C++ level and with
Python ctypes. It moves the searching for the correct library from
run-time to compile-time.
2017-03-16 16:10:17 -04:00
10d95bd0f0 Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4702086

fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601
2017-03-16 11:47:52 -07:00
412148a62a update NNPACK module 2017-03-16 11:14:24 -07:00
f0b75c4aa4 Merge pull request #729 from shenxiul/cuda_linspace
linspace and logspace for CUDA Tensors
2017-03-16 14:03:00 -04:00
7654b3f49e Add function to compute cross_entropy for 2D image (#802) 2017-03-16 17:34:04 +01:00
37ebbc2809 the length of any item in padded_sequence should be greater than 0 (#1013) 2017-03-16 17:32:43 +01:00
b2ab7365be fix for special case when dense dim is 1
Summary: otherwise it will fail here: https://fburl.com/puy5x2dq

Reviewed By: kittipatv

Differential Revision: D4719212

fbshipit-source-id: e0d8211f64dca00ee48df3235d2bc030ea30f208
2017-03-16 05:19:10 -07:00
8241cd7b6e Fix compilation error when compiling with 'clang -x cuda'.
Functions vFetch and vStore are not found by ADL with clang,
so they need to be declared before usage in ReduceCopy.
2017-03-16 12:01:11 +01:00
a7781fdebc Use default Redis port in RedisStore constructor
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4718573

fbshipit-source-id: c0b9aa78cf1f4db910526841c0172537b9243f7e
2017-03-15 22:19:51 -07:00
29ddbc3e37 implement linspace, logspace and range in CUDA 2017-03-15 20:50:30 -07:00
7773a2d643 Bugfix: type not being set when inferring types+shapes
Summary:
/cc akyrola

I basically just copied all the `ShapeCall` stuff as `TypeCall`. Is there a better way?
Closes https://github.com/caffe2/caffe2/pull/187

Differential Revision: D4699312

Pulled By: Yangqing

fbshipit-source-id: 92f736ffe4127b00b5821acb1eb359771975fdd7
2017-03-15 18:48:40 -07:00
16a133ed9a Fixes for testing on FB infra (#1009)
- make each test in test_autograd have a unique name ignoring case
 - assemble all tests when test_legacy_nn is imported
 - import Python.h in PtrWrapper.h
2017-03-15 18:37:11 -04:00
1aa665f6a8 Documentation
Summary:
* Add separate file for rendezvous docs
* Mention using MPI for rendezvous
* Fix algorithm docs formatting
Closes https://github.com/facebookincubator/gloo/pull/13

Differential Revision: D4715442

Pulled By: pietern

fbshipit-source-id: 0469ab8d16fd489a38c399ec2b25860d1225ce72
2017-03-15 14:58:51 -07:00
c4d1318662 Fix map_location in torch.load (#1006) 2017-03-15 16:54:19 -04:00
379ae6d865 Refactor out dispatchStateless (#1007)
Some of the error messages were incorrect due to erroneous
'tensor == THPDefaultTensorClass' checks
2017-03-15 16:24:55 -04:00
9aa277eeb1 Fix cmake gcc error per @benbarsdell
Summary: Closes https://github.com/caffe2/caffe2/pull/203

Differential Revision: D4714861

Pulled By: Yangqing

fbshipit-source-id: 191414c8a39b2437292128b266a8c4a3502dcedf
2017-03-15 11:49:14 -07:00
24376ff9d3 Merge pull request #723 from killeent/scan-primitive
add implementation of inclusive scan via upsweep-downsweep
2017-03-15 14:37:21 -04:00
56f324d191 Added predictor bindings to python interface
Summary: from caffe2.python import workspace; p = workspace.Predictor(init_net, predict_net); outputs = p.run(inputs)

Reviewed By: Yangqing

Differential Revision: D4576793

fbshipit-source-id: b829bbcaf2e7c34dad85024177433207bd96a234
2017-03-15 11:17:54 -07:00
61dd35f1d6 FCWithoutBias layer
Summary: For some embedding task, we don't want to include bias term in embedding computation.

Reviewed By: xianjiec

Differential Revision: D4689620

fbshipit-source-id: 4168584681d30c0eaa1d17ceaf68edda11924644
2017-03-15 11:03:37 -07:00
6ac793dcbe Reuse ncclComm_t across algorithm instances
Summary: Initializing ncclComm_t is expensive. Allocate a set of ncclComm_t for each unique device set and cache for reuse. With this change the CudaAllreduceChunked tests runtime improved from ~170 sec -> ~10 sec on my machine. There is no improvement in the benchmark numbers because the algorithm instance is only allocated once.

Reviewed By: pietern

Differential Revision: D4708943

fbshipit-source-id: 85b85070586d6683a762b8282df593ca831e7bc7
2017-03-15 09:51:43 -07:00
e00d9c1fd8 Execute benchmark through mpirun
Summary:
This change includes CMake changes to compile the MPI assets when the USE_MPI flag is enabled. If so, the benchmark tool can now be launched through mpirun.

Includes the changes done in #11.
Closes https://github.com/facebookincubator/gloo/pull/12

Reviewed By: Yangqing

Differential Revision: D4712060

Pulled By: pietern

fbshipit-source-id: 0d0e93882f5822583f59304d4256dbdf5dea7483
2017-03-15 08:21:12 -07:00
92101aa87a Update resnet50 example
Summary:
Make it use Gloo and optionally use Redis for rendezvous (where a
shared filesystem is not available).

Differential Revision: D4709943

fbshipit-source-id: 59cc7a14316c7b634417ea5161a75fab3c19f2fa
2017-03-15 08:18:50 -07:00
be6322e4b5 Update nn.init docstrings to correctly reference the module (#1001) 2017-03-15 11:17:59 -04:00
62063b2f62 Fix docs for pointwise ops (#845) (#985)
* add torch.nn.init docs to the source folder
2017-03-15 11:08:05 -04:00
518d36d34b Add PReLU translator
Summary: Closes https://github.com/caffe2/caffe2/pull/171

Differential Revision: D4711877

Pulled By: Yangqing

fbshipit-source-id: 555f733e6eabf351480b7d4398aa05755cc26599
2017-03-15 02:47:03 -07:00
bb58074332 support get/add a field by nested name
Summary:
We are having more and more nested Struct schema. There is increasing need to get/adda field by nested name, e.g., for the following nest Struct schema:

st = Struct(
  ('a': Scalar()),
  ('b': Struct(
     ('c': Scalar()),
  )),
)

We may want to get the field "b:c" and/or insert a new field "b:x". The immediate need is for dper2 metrics.

This diff is to achieve this.

Reviewed By: kittipatv

Differential Revision: D4690225

fbshipit-source-id: 71d4a74b36bd1228a2fefd901db2f200602152b7
2017-03-15 02:00:57 -07:00
26628d10ff Fix workspace clashes
Summary: For example, test and train nets could have shared workspaces, leading to race condition. This adds an assertion and adds a running counter to the workspace-blob name.

Reviewed By: jhcross

Differential Revision: D4712152

fbshipit-source-id: 808d7069095bac24ebfe0c9d31ebd134f4cf0956
2017-03-14 23:33:28 -07:00
ba9cac4d98 fix mkl contbuild
Summary:
This should fix mkl contbuild per the most recent bugfix from Intel.
Closes https://github.com/caffe2/caffe2/pull/189

Differential Revision: D4711448

Pulled By: Yangqing

fbshipit-source-id: 70d1b35fa4fe6cc9b4d36ec0fcfbd6d33f313182
2017-03-14 23:33:28 -07:00
3176cd6292 update nnpack submodule 2017-03-14 23:17:43 -07:00
9e6fd02c28 Use Gloo ops in data_parallel_model
Summary:
No longer need GPU to CPU copies. The allreduce operator no longer
uses 'local allreduce - global allreduce - local broadcast' sequence
when Gloo is used, but passes all input blobs directly.

Depends on D4708860.

Differential Revision: D4709897

fbshipit-source-id: 4d745d5d8bac9c2fcca081dd5d812c902808c3b6
2017-03-14 22:34:51 -07:00
4d7451399b XRay mobile quantized model
Summary:
This is going to allow to experiment with various training from scratch / fine tunning technics. The code itself for the new model is not intended to be used as is. Instead one could  train a full precision model first. Then add quantization for the last layer, then for the next one and so on.

In my experiments I tried getting a pretrained model and then quantizing all inception layers with 4 bits. This restored original accuracy after several dozen iterations

Also in this diff I added a common prefix to the model checkpoint + added this prefix to git / hg ignore.
And also some extra logs which are usefull to quickly see how things changed right after enabling quantization

Differential Revision: D4672026

fbshipit-source-id: b022c8ccf11dd8a2af1a7b2e92673483bc741a11
2017-03-14 22:18:40 -07:00
9e593a901c fix memory corruption
Summary: D4704547 caused stuff to crash with various memory corruption errors. The problem appears to be in calling sharedWorkspaces->resize(), although I don't completely understand why. Something to do with moving the shared_ptrs around? Anyway, first clearing and then resizing (only needed when seqLen is bigger than what we have allocated) fixes the issue.

Reviewed By: jhcross, Yangqing

Differential Revision: D4711675

fbshipit-source-id: 35c70e8258555fcb6d403df35e0d391aebe96485
2017-03-14 21:32:55 -07:00
13b1580613 add F.pad to docs 2017-03-15 00:09:14 -04:00
fe788f5003 Use correct event to synchronize destination buffer in NCCLElement
Summary: NCCLOp::runNCCL is mistakenly recording an event in the source pointer after the NCCL op. This results in NCCLOp::wait() returning without synchronizing with the output buffer. The synchronous tests using NCCL fail.

Reviewed By: pietern

Differential Revision: D4708860

fbshipit-source-id: 0c36511e260b587d410e5c9604552ceedd06d988
2017-03-14 19:20:59 -07:00
f449af378d Explicitly pass CXX to NCCL Makefile
Summary:
Necessary if CXX isn't set when cmake is called. The CXX variable will then be
empty which prevents make from using its own default.
Closes https://github.com/caffe2/caffe2/pull/202

Differential Revision: D4711113

Pulled By: Yangqing

fbshipit-source-id: 895c07044b263ba9b5440453978248506d7ac225
2017-03-14 18:33:36 -07:00
014d1fe5c4 Allow test discovery in caffe2/python/
Summary:
These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery.

With this patch, you can use any of these methods to discover and run tests under `caffe2/python`:
```
python -m unittest discover -p '*test*.py' caffe2/python/
python -m nose caffe2/python/
python -m pytest caffe2/python/
```

Future work:

* Get all of the tests to pass
  * Some seem to be testing operations which don't have GPU implementations
  * I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0`
  * Some tests are flaky
* Allow test discovery throughout the whole project (e.g. the `experiments/` dir)
Closes https://github.com/caffe2/caffe2/pull/199

Reviewed By: pietern

Differential Revision: D4704504

Pulled By: Yangqing

fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b
2017-03-14 18:16:41 -07:00
2ce5121db1 Reuse workspaces in RecurrentNetOp -> much faster
Summary:
RecurrentNetOp created workspaces at every run, which is very wasteful, as it had to also recreate the stepnets (forward and backward!).

.

Reviewed By: salexspb

Differential Revision: D4704547

fbshipit-source-id: 460028d912d6a735448c445cb83c0c4d03286351
2017-03-14 16:34:40 -07:00
91f468b15c fixes to make data parallel model work for RecurrentNet + test case
Summary:
First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made:
 - cell net/step net external inputs must be namespace scoped
 - prevent double-namescoping of cellnet inputs
 - make data parallel model understand recurrentnets so the device-mapping works

Reviewed By: salexspb

Differential Revision: D4708840

fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4
2017-03-14 15:48:07 -07:00
95ecf22c0c Add throughput.py for throughput measurements
Summary: TSIA

Differential Revision: D4629898

fbshipit-source-id: 6c65ca67e498c4b84173939042326214359b052a
2017-03-14 15:48:06 -07:00
25b1221579 Allow scalar output in functional layer
Summary: Some operators, e.g., SoftmaxWithLoss, returns scalar-typed tensor. This would allow us to use those ops without having to write layer manually.

Reviewed By: xianjiec, kennyhorror

Differential Revision: D4703982

fbshipit-source-id: f33969971c57fc037c9b44adb37af1caba4084b6
2017-03-14 15:32:47 -07:00
e50a1f19b3 Use streams in scatter to overlap copy with compute 2017-03-14 22:46:07 +01:00
e86db387ba Fix conv1d backward segfault (#999) 2017-03-14 16:15:53 -04:00
783e40e806 Fix lengths-remapping again + better errors
Summary: When cloning recurrent net op, we do a remapping of the lengths-blobs. But if they don't exists (like with CRF), we should not do that.

Differential Revision: D4702123

fbshipit-source-id: 37a22d11e709011b8b98b2cc3d9f08eb9fda06c4
2017-03-14 11:04:45 -07:00
b7530cc54a Optional central cropping in ImageInputOp
Summary:
Central cropping during test phase, similar to Caffe's behavior
Closes https://github.com/caffe2/caffe2/pull/195

Differential Revision: D4704506

Pulled By: Yangqing

fbshipit-source-id: cf7d457dc2acfe8ff5a225ebfd5f8cd0f9d92a07
2017-03-14 10:33:08 -07:00
a74c2bcda8 Fix build_ios.sh bug (#194) due to name collision.
Summary: Closes https://github.com/caffe2/caffe2/pull/200

Differential Revision: D4704479

Pulled By: Yangqing

fbshipit-source-id: 7a618fa2cd57fd2ead5cede5d5aed033284ea67e
2017-03-14 10:33:08 -07:00
dec78a37b4 Increase threshold for using chunked allreduce
Summary:
Yield better throughput since full ring allreduce is cheaper for
smaller blobs (fewer communication steps).

Reviewed By: andrewwdye

Differential Revision: D4704850

fbshipit-source-id: 338addd919f454c94412ea145e1280492f765c72
2017-03-14 09:32:42 -07:00
f26f225972 Support multiple inputs to broadcast/allreduce ops
Summary:
TSIA

For the broadcast op the first input tensor on the process with the
specified rank is broadcast to all other processes and outputs.

For the allreduce op all inputs are considered for the reduction.

Reviewed By: andrewwdye

Differential Revision: D4704540

fbshipit-source-id: e6879ca0a9adffe0bc61bf74a333c4052bc8bd92
2017-03-14 09:32:42 -07:00
c2e270d8bc Add Gloo ops
Summary:
Add broadcast/allreduce ops backed by Gloo (see
https://github.com/facebookincubator/gloo).

Reviewed By: andrewwdye

Differential Revision: D4704536

fbshipit-source-id: 918852055b28a90b1f6fc8615793398db2c25d15
2017-03-14 09:32:41 -07:00
436193bf37 fix minor typo in math_{cpu.cc,gpu.cu}
Summary: Closes https://github.com/caffe2/caffe2/pull/196

Reviewed By: pietern

Differential Revision: D4703131

Pulled By: Yangqing

fbshipit-source-id: cb6e61a41a858e9cb164697a585ef257a8d0530e
2017-03-13 22:47:15 -07:00
1fac027d0e Quantized Training API
Summary: These python helpers are going to provide sufficient book keeping when adding quantization for conv layers

Reviewed By: Yangqing

Differential Revision: D4671478

fbshipit-source-id: 292e2f633dd30969c0afbe7a8075b340ce9a6d12
2017-03-13 22:17:58 -07:00
a1d63da6af Adding UNK to vocab | Changing default params
Summary: UNK needs tobe indexed in the vocabulary for validation to work. Default args now result in training loss decreasing.

Reviewed By: urikz

Differential Revision: D4703393

fbshipit-source-id: e4d6ad100daf8392f8ba1e502f9ecf39bb8ce24a
2017-03-13 22:17:48 -07:00
85fad20a5a Gracefully handle empty input to Dropout
Summary:
Context:
https://fb.facebook.com/groups/1405155842844877/permalink/1677762748917517/.
DropoutOp and DropoutGradientOp already handle input of size 0 gracefully. The
CHECK isn't needed. I think this should fix the crash in xray detection models
where num region proposals are zero.

Differential Revision: D4697254

fbshipit-source-id: afd06975f2ad4b2e59f15d12b0aa332f6eb3f1af
2017-03-13 21:47:56 -07:00
1bf61b8adc Add googletest submodule 2017-03-14 03:39:54 +00:00
84ba7c1acb Skip test if libfb not present
Summary:
Allows `nose` or `pytest` to collect all tests.
```sh
$ cd build
$ nosetests --collect-only
..............................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 222 tests in 0.430s

OK
```
Closes https://github.com/caffe2/caffe2/pull/198

Differential Revision: D4700783

Pulled By: Yangqing

fbshipit-source-id: 97504f6b14537669aa150f6a71283e851829ac5e
2017-03-13 17:31:43 -07:00
704ee3ca68 Use cudart symbols from the main program.
Our extension library links against cudart and pulls in the symbols. Use
LoadLibrary(None) to use the same symbols as the _C extension.

This fixes the PyTorch wheel when you don't have system CUDA installed.
2017-03-13 19:45:34 -04:00
fc7939c25b add model_helper.ExtractPredictorNet()
Summary:
It has been a pain to save predictor-compatible models from Caffe2. This diff adds function ExtractPredictorNet that takes a training model and outputs a predictor model by removing all operators that are not relevant for prediction, such as backward pass and dequeue-ops for input loading (as in predictor, the input data is external input).

We can also consider including this directly in the predictor exporter for FB usage.

Reviewed By: rpenggithub

Differential Revision: D4693264

fbshipit-source-id: e81abbbec0bd4d717159cf36488d0baaf0130090
2017-03-13 16:32:04 -07:00
a745981c94 ReduceBack{Sum|Mean}Op CPU & GPU implementation
Summary:
Implement ReduceBackSum & ReduceBackMean with gradients for CPU & GPU contexts.
The reduction happens among the last dimenstions for example if input is a
M x N matrix ReduceBackSum will results a vector of dim M x 1 contains the
rowwise sums.

Differential Revision: D4689768

fbshipit-source-id: 5b0482d4341867ecf23526dc6c4d544420e7d8f7
2017-03-13 16:19:58 -07:00
9004652c7b updated the documentation to remove the unnecessary copy grads when using multiprocessing 2017-03-13 19:04:17 -04:00
ee2bc06926 Add Shape Inference for Reshape Operator
Summary: Add shape inference for reshape. Because it cannot do shape inference for reshaped tensor with runtime tensor data, set `out[0].set_unknown_shape(true)` if no `shape` argument is used.

Differential Revision: D4671125

fbshipit-source-id: 685a9198f9b08e3336014c792f20051b381d8619
2017-03-13 14:31:27 -07:00
001ac5d751 Fix to use appropriate corpus and vocab in eval
Summary: We should be using the vocabulary built on the training data, and corpus_eval as data for the evaluation phase.

Reviewed By: urikz

Differential Revision: D4700382

fbshipit-source-id: ca1dd043a28f9bb585faad050c82fb12c1cdf6cc
2017-03-13 14:31:27 -07:00
aca6ce984c change lookup table sort 2017-03-13 13:55:16 -07:00
ed8773f7bd add legacy_serialized.pt to gitignore 2017-03-13 16:37:35 -04:00
0f7b7b27b1 Fix build for CMake 2.8.12
Summary:
This is the minimum required CMake version (also the version that is available on Ubuntu Trusty (14.04)).
Closes https://github.com/facebookincubator/gloo/pull/9

Reviewed By: Yangqing

Differential Revision: D4698659

Pulled By: pietern

fbshipit-source-id: bf01541fe485c03e7c665f175c2887feaf9516a3
2017-03-13 13:06:15 -07:00
a5a5d00b87 Fixed a bug: 'ModelTrainerLog instance has no attribute 'external_loggers''
Summary: Fixed a bug (AttributeError: ModelTrainerLog instance has no attribute 'external_loggers', at File "caffe2/python/experiment_util.py", line 101) when no external_loggers is passed to ModelTrainerLog().

Differential Revision: D4697197

fbshipit-source-id: 1c770c366d87ea474bcf40ab289b67c76648d48b
2017-03-13 12:32:36 -07:00
48f48b6ff2 fix more flaky VolumetricMaxPooling tests 2017-03-13 14:38:27 -04:00
615b27eadf fix corner case in SetItem of Variable 2017-03-13 14:38:27 -04:00
86ede33035 CMake improvements for Gloo
Summary: Install headers and add .. to include directories

Reviewed By: pietern

Differential Revision: D4695500

fbshipit-source-id: f48a49f03e575408829793cb63bfdb16d8e3a309
2017-03-13 11:06:05 -07:00
f422e6307d Add poly learning rate policy
Summary:
Add LR policy from Caffe
Closes https://github.com/caffe2/caffe2/pull/192

Differential Revision: D4687377

Pulled By: Yangqing

fbshipit-source-id: ba0a48a937ab4784e1c31249a3ed858b248d988f
2017-03-13 10:02:42 -07:00
bd09055207 Synchronize all NCCL ops with shared per-device streams
Summary:
Allocate a set of per-device streams used to serialize NCCL op scheduling. These ensure concurrent NCCL ops are not interleaved across devices (i.e., through priority scheduling), resulting in deadlock.

Synchronize source and destination streams with NCCL streams.

Reviewed By: pietern

Differential Revision: D4685360

fbshipit-source-id: 3c228b195b0a0d9d7cccc720163898d344a5ed4c
2017-03-13 09:20:05 -07:00
4bd220d91a Travis contbuild scripts and cmake fix.
Summary:
TSIA. Redoing #7 to kick travis.
Closes https://github.com/facebookincubator/gloo/pull/8

Reviewed By: Yangqing

Differential Revision: D4697132

Pulled By: pietern

fbshipit-source-id: d03148aeddb2cf927b4ef3689c97d9ba4f4cdc9d
2017-03-13 08:36:10 -07:00
170d790b66 fix doc of conv3d in conv.py (#989)
the second dimension should be height.
2017-03-13 11:30:13 -04:00
e216f557fd Fixes issue returning strings from a Dataloader with pin_memory=True (#908) 2017-03-13 10:11:07 +01:00
e5858485ca small change to concat layer to make tensor board vis nicer
Summary:
otherwise the blob will be in different namescope, e.g., `_nested`: https://fburl.com/ntlsaezv.
this make tensorboard ugly.

Reviewed By: dzhulgakov

Differential Revision: D4696946

fbshipit-source-id: 73627feccd7c4896964e6c549b7241bcce4f49a7
2017-03-12 23:01:18 -07:00
6729d81418 Specify which GPUs to use in resnet50 example
Summary:
TSIA

This change also fixes an undefined attribute error after running 20
iterations of the resnet50 example trainer.

Differential Revision: D4692794

fbshipit-source-id: b98efdfeb078c5ba89d2a86837f3c672e1eade5f
2017-03-12 22:33:15 -07:00
997312c233 Add WeightedRandomSampler (#980)
Samples elements from `[0,..,len(weights)-1]` with given probabilities (weights). So far there is no mean to either introduce sample weights in loss functions or while sampling from a dataset. This is an attempt to add the functionality for the latter issue.
2017-03-13 00:27:05 -04:00
d602b3a834 Allow submodules and parameters to shadow attrs on assignment 2017-03-12 13:31:32 -04:00
f531d98341 Fix memory leak in torch.from_numpy 2017-03-12 13:31:32 -04:00
6bdd5ecaf5 Remove some unnecessary AutoGPU calls 2017-03-12 13:31:32 -04:00
bfbde9d6eb Fix Embedding bug when max_norm was used 2017-03-12 13:31:32 -04:00
b9c816a796 Fix run_test.sh --coverage option. (#983) 2017-03-11 19:26:02 -05:00
2f5c215d34 Update setup.py (#981)
Adding `description` to `setup.py`
2017-03-11 12:14:07 -05:00
01650ac9de add torch.nn.init docs to the source folder (#979) 2017-03-11 10:11:30 -05:00
43b6fcba7d Improve error message from LogFileDB on missing file
Summary: A lot of people get confused if the file can't be loaded.

Reviewed By: rpenggithub

Differential Revision: D4686572

fbshipit-source-id: 519ff68a3d4f04cf8ce893f255f7814e043383b6
2017-03-10 23:31:28 -08:00
a4a136038e more descriptive error message
Summary: pc_load_letter

Differential Revision: D4693855

fbshipit-source-id: af1ce4884570dc60c309c113c698e86c54ed2b93
2017-03-10 22:03:15 -08:00
3f682ca699 Fix to data parallel model blob_to_device mapping
Summary: We need the InferToDeviceMapping too early, or we should had done it also after running parameter update function since that can create new blobs like the momentum blobs. This fix is maybe not optimal, but works and is fast enough.

Differential Revision: D4693450

fbshipit-source-id: 4c4cc2396dad371b3fbcd1d8da51133ea09a57e0
2017-03-10 18:03:58 -08:00
b61aaa90b6 Stop multi_reader if we run out of data before max_examples
Summary:
Before we didn't propagate the 'out-of-data' signal if splits_per_epoch wasn't specified.

Right now it's a hacky fix (just reuse ReaderWithLimit). azzolini - any suggestions of more elegant solution? I can create an extra reader that just export "is empty" signal out.

Overall, I guess we need to turn global_queue into a more sustainable unittest that verifies all possible combinations - I'm still not sure it's correct :-\

Reviewed By: xianjiec

Differential Revision: D4665677

fbshipit-source-id: fe44d10ee82c3383145635e67dea1d9b666e061f
2017-03-10 18:03:57 -08:00
31b72b9004 move reshape out of utility_ops
Summary: move reshape as individual op

Reviewed By: ajtulloch

Differential Revision: D4690919

fbshipit-source-id: a84859d738039125a4f4122365619b69d5990427
2017-03-10 16:21:50 -08:00
0308910c58 Enable use of Print for LayerModelHelper
Summary: Whe debug using LayerModelHelper, adding Print to model will trigger this assert.

Reviewed By: xianjiec

Differential Revision: D4687859

fbshipit-source-id: 6932e38f8dd17ba0b80da18a20943ecdb2e8af0a
2017-03-10 15:26:16 -08:00
ce536aa355 fix example in docs for NLLLoss 2017-03-10 16:48:08 -05:00
a109cbdfb6 fix bug in data_parallel_model stripParams()
Summary: Thanks for shenpan, detected this bug. Problem is that FinalizeAfterCheckponit() can be passed a list of strings, not blob references, and that fails in stripParam() after assertion I added in D4649208. It is ok to pass strings as well to that function.

Reviewed By: jhcross

Differential Revision: D4691028

fbshipit-source-id: 0bca80d44a5ab641438cc5b26482bca0b1527d69
2017-03-10 13:17:11 -08:00
fc0af33a18 key only block-wide bitonic sort 2017-03-10 11:50:43 -08:00
0e7e9888f7 Explicitly do MPI prefix for ops before it is too late
Summary: Chatted with pietern today, figured it is an easy change.

Reviewed By: pietern

Differential Revision: D4688275

fbshipit-source-id: a2751f1ff9f192ba6f2bd961be6ad1c693c8b5c6
2017-03-10 10:18:34 -08:00
c7c4778af6 modify docs of broadcast to fix issuse #940 (#970) 2017-03-10 09:54:43 -05:00
adb3f0ec22 add exception for empty shape param
Summary: Following krp's suggestion, check if the shape parameter is empty.

Reviewed By: dzhulgakov

Differential Revision: D4686698

fbshipit-source-id: 3f9fb1e3215dd2a4a726442531201eeb18224bc6
2017-03-10 00:33:59 -08:00
d873077349 Create context from existing MPI communicator
Summary:
This makes it easy to use Gloo transports and algorithms in existing
MPI environments.

Reviewed By: andrewwdye

Differential Revision: D4685999

fbshipit-source-id: cfc7d0e445893512b4e4ed2abe1bb280d83b9c70
2017-03-09 23:06:18 -08:00
0c38827318 Split out rendezvous specifics from context
Summary:
How pairs are setup and connected to one another is specific to
whatever underlying rendezvous mechanism is used. This change moves
the `connectFullMesh` function into a subclass in the `rendezvous`
directory. This prepares for a separate MPI context that can setup
pairs between processes using an existing MPI communicator.

Reviewed By: andrewwdye

Differential Revision: D4684755

fbshipit-source-id: 9eb643b8ba545b3e6f9a36b65642b3b04a5f0077
2017-03-09 23:06:18 -08:00
fb766c00b3 Align async\wait pattern to use wait() naming
Summary: TSIA

Reviewed By: pietern

Differential Revision: D4686783

fbshipit-source-id: ccbdace0d53219bd4b881ea27f7f972b206215b6
2017-03-09 21:20:45 -08:00
e600c9830a Fix up NCCLElement construction in CudaBroadcastOneToAll
Summary: TSIA

Reviewed By: pietern

Differential Revision: D4686520

fbshipit-source-id: 657ca90aa1971be152b037563105a9f490137a69
2017-03-09 20:37:03 -08:00
f93039b9c4 check data is allocated
Summary: enforce data allocation (if we can't allocate something is broken)

Reviewed By: Yangqing

Differential Revision: D4684460

fbshipit-source-id: bb5cf0a9ddeecc6fa1bfd53a9367adc54506dd6d
2017-03-09 20:32:49 -08:00
73a65cd29f simple ordering fix to avoid gcc warning 2017-03-09 17:10:59 -08:00
965a7daf9b Implement MILSTM in caffe2
Summary:
Created a new function with specifics related to MI LSTM implementation in caffe2
See https://arxiv.org/pdf/1606.06630.pdf for details.
See D4478877 for the implementation of the same in tensorflow

Reviewed By: jhcross

Differential Revision: D4669882

fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639
2017-03-09 16:32:47 -08:00
bde53f61af Caffe2: add scuba logging to benchmark
Summary: Caffe2: add scuba logging to benchmark

Differential Revision: D4667194

fbshipit-source-id: 8e9fca5517d7d40a6bc3e55cd00161e7482cd6f4
2017-03-09 16:32:47 -08:00
57ecd20197 seq2seq open source implementation
Summary:
OSS implementation of seq2seq model in Caffe2. The script uses Seq2SeqModelCaffe2 class to build and run the model. It takes in training data in the form of text file with one sentence in each line, builds a vocabulary, generates batches based on batch size and runs the net for a configurable number of epochs. It prints total scalar loss at the end of each epoch.

All FBLearner and neural_mt type system dependencies have been removed. Unimplemented and unnecessary methods have been removed to make the script simpler.
fblearner/flow/projects/langtech/translation/neural_mt/model_util_caffe2.py has been moved to caffe2/caffe2/python/examples/seq2seq_util.py and remains unchanged

Potential TODOs:
  - Get the model running in GPU. Only GatherOp does not have a corresponding GPU implementation. Try adding CopyGPUToCPU before and CopyCPUToGPU after Gather, and use CUDA DeviceOption.
  - Add evaluation on test data with suitable metric (perplexity? bleu?)

Reviewed By: urikz

Differential Revision: D4653333

fbshipit-source-id: 1c7d970ebc86afe23fad4d48854296bf54eb0f77
2017-03-09 16:18:08 -08:00
b785ed0ac0 Fix Embedding and CosineEmbeddingLoss on non-float CUDA (#965) 2017-03-09 18:04:40 -05:00
b2d077d81d Update _tensor_docs.py (#966) 2017-03-09 18:04:19 -05:00
c5621ded31 Allow use of ReversePackedSegs operator in CUDA context
Summary: ReversePackedSegs operator for CUDA. Input "lengths" (static integers) required to be in CPU memory.

Differential Revision: D4661281

fbshipit-source-id: c800c316c34015ba8e732dcbcaa8c4edaffdfeab
2017-03-09 15:03:55 -08:00
89c08334bb data_parallel_model support for sparse gradients and CPU ops
Summary:
Data parallel model did not support sparse operations, nor gradients computed on CPU ops.

Currently sparse operations are done on CPU, so there is no point of "data parallelizing" them. I had to make a few changes to data_parallel_model to support this:
 1. Model can have params that are added prior to adding the data parallel part. For example, a lookup table of word vectors would be a parameter that is non-parallel.
 2. Thus, when data parallel model is called, it will separate the non-parallel params and avoid working on them. Note: when we add distributed version, we need to explicitly handle them with AllGather!

This works nicely since Caffe2 automatically adds the backward concat-operator when multiple ops gather from the same blob.

I also added support for data parallel CPU ops, which might be necessary in cases when we don't have GPU implemenation of some ops.

Test in data_parallel_model_test validates the correctness of the code by running the same trainer on different number of gpus and checking the end result is same.

Reviewed By: jhcross

Differential Revision: D4649208

fbshipit-source-id: e3b7ae701ead468dc94c52a976eafec5c9831097
2017-03-09 13:48:41 -08:00
4814b0bc09 Recompose NCCLElement of src/dst CudaDevicePointers
Summary: CudaDevicePointer has the information we need for a NCCL op. Refactor NCCLElement as a composition of src and dst CudaDevicePointers. This allows for separate streams for src and dst, and will simplify a future change to use a static set of streams for all NCCL ops.

Reviewed By: pietern

Differential Revision: D4679483

fbshipit-source-id: 75656cc2fa5b5e2a6c096d914d2111769a47291b
2017-03-09 12:26:55 -08:00
b1c2714ad5 Add momentum and centered options to RMSProp (#810)
* add momentum and centered options

Add two options :
 - Momentum (like SGD's momentum)
- Centered RMSprop, as in Graves 2013 ( https://arxiv.org/abs/1308.0850 ) : grad is normalized by running estimation of its variance

* somme PEP8

* bug in default

* bug2

* sign mistake

* alloc of momentum & centered only if needed

* add link to docstring

* some pep8 on docstring

* implement __setstate__() for backward compatibilty

* correct grammar mistake

* multiply by lr when adding delta to params

* rename momentum variables

* change __init__ params order
2017-03-09 10:04:32 +01:00
41a3ec2455 QTensor serialization/deserialization
Summary: Added protobuf style serialization/deserialization w/o chunking for qtensors

Reviewed By: salexspb

Differential Revision: D4622677

fbshipit-source-id: 1f845ad773a61b7ae2c362ec31d8de04e4217f68
2017-03-09 00:01:12 -08:00
5bb5572719 check correct signal counter
Summary: Not sure whether does this influence anything.

Reviewed By: azzolini

Differential Revision: D4671128

fbshipit-source-id: 7a018dd54eb68127eb0c151dbc594b94ac4da0ea
2017-03-08 23:49:41 -08:00
84e742ded7 Migrate realtime training workflows to use new metrics.
Summary: This diff is getting rid of old metrics interface in realtime training.

Reviewed By: xianjiec

Differential Revision: D4649734

fbshipit-source-id: de4af85eb5476df9790ebd3915625bf8beee65af
2017-03-08 23:49:41 -08:00
eeb7279020 compile execution step
Summary:
When the execution step is representing things like:
for loop
  execution_step
     net1
  execution_step
     net2
     net3
the preparation cost for execution step is too high.
This diff moves most of the shared information in the CompiledExecutionStep to save time.

After the change the benchmark result for parameter server handler is as following: (be aware that the first two have some variance)
INFO:__main__:==Summary==
INFO:__main__:Time <function case_if at 0x7f7160c32938> 0.0752924203873
INFO:__main__:Time <function case_loop at 0x7f7160c329b0> 0.0677666187286
INFO:__main__:Time <function case_simple_net at 0x7f7160c32a28> 0.0605396509171
INFO:__main__:Time <function case_one_loop at 0x7f7160c32aa0> 0.0611681699753

Before the change:
INFO:main:==Summary==
INFO:main:Time <function case_if at 0x7f19d079f848> 0.100815701485
INFO:main:Time <function case_loop at 0x7f19d079f8c0> 0.0864136457443
INFO:main:Time <function case_simple_net at 0x7f19d079f938> 0.0614696979523
INFO:main:Time <function case_one_loop at 0x7f19d079f9b0> 0.0598972082138

Reviewed By: azzolini

Differential Revision: D4643926

fbshipit-source-id: 5a4b97230ba778e0ff5cbafc8a216335a191068a
2017-03-08 23:49:41 -08:00
95501a0165 clean old unit test, add sum processor and sqrt pooling
Summary: sum processor and sqrt pooling is to mimic the DoubleHelix model.

Differential Revision: D4678413

fbshipit-source-id: fc1ccfe3c92c540ce5914dfd8ff1a040805c48db
2017-03-08 23:04:19 -08:00
86e60848c5 use gflags namespace instead of google
Summary:
`google` namespace is deprecated in gflags. Replacing it with `gflags` namespace.

gflags was generated from this diff: P57170122
```
% echo $gflags
google::(RegisterFlagValidator|CommandLineFlagInfo|GetAllFlags|ShowUsageWithFlags|ShowUsageWithFlagsRestrict|\
DescribeOneFlag|SetArgv|GetArgvs|GetArgv|GetArgv0|GetArgvSum|ProgramInvocationName|ProgramInvocationShortName|\
ProgramUsage|VersionString|GetCommandLineOption|GetCommandLineFlagInfo|GetCommandLineFlagInfoOrDie|]
FlagSettingMode|SET_FLAGS_VALUE|SET_FLAG_IF_DEFAULT|SET_FLAGS_DEFAULT|SetCommandLineOption|\
SetCommandLineOptionWithMode|FlagSaver|CommandlineFlagsIntoString|ReadFlagsFromString|AppendFlagsIntoFile|\
ReadFromFlagsFile|BoolFromEnv|Int32FromEnv|Uint32FromEnv|Int64FromEnv|Uint64FromEnv|DoubleFromEnv|\
StringFromEnv|SetUsageMessage|SetVersionString|ParseCommandLineNonHelpFlags|HandleCommandLineHelpFlags|\
AllowCommandLineReparsing|ReparseCommandLineNonHelpFlags|ShutDownCommandLineFlags|FlagRegisterer)

% hg grep -wlE "$gflags" 're:fbcode.*\.(cc|cpp|h)' | xargs perl -pi -e 's,\bgoogle::,gflags::,g if /'"$gflags"'/'
```

Reviewed By: meyering

Differential Revision: D4669201

fbshipit-source-id: 8053ba6fba9acf6eaf6796f0f297a9e07784973f
2017-03-08 22:16:47 -08:00
842ee41999 Fix binary file reading bug for MSC compiler
Summary:
For MSC compiler binary flag needs to be specified
Closes https://github.com/caffe2/caffe2/pull/191

Differential Revision: D4677511

Pulled By: Yangqing

fbshipit-source-id: 4f80f09bd4bf9b6b6eff352cc67a62163255334f
2017-03-08 20:31:12 -08:00
581e57c244 add AccumulateHistogramOp
Summary: AccumulateHistogramOp, for computing the histogram of all values in input tensors

Differential Revision: D4654417

fbshipit-source-id: dea92346004c772af16e1eb41306287d81dc5a02
2017-03-08 19:37:32 -08:00
e88379ef3a Implement deep function recursion as a loop with a stack instead
Summary: Replace the recursion by using stack

Differential Revision: D4650848

fbshipit-source-id: bd0e3f82cf92e9548b83a495a6fcf187467fcb3d
2017-03-08 19:08:13 -08:00
8a84d03253 move qtensor to open source
Summary: Releasing to OS

Reviewed By: salexspb

Differential Revision: D4623486

fbshipit-source-id: 714b1cf2137f164d7925eb52d2a6ed4442e4457e
2017-03-08 18:02:39 -08:00
a462edd0f6 Docs(RNN|GRU|LSTM): Note dropout applies to all layers *except* the last layer (#961)
This is an important clarification to make as otherwise users are misled as to where they may need to add dropout and to clarify the situation would need to delve into the backend implementation. 
4647f753bc/torch/nn/_functions/rnn.py (L73)
2017-03-08 18:09:11 -05:00
c6a9d7f188 User input (Conv out, etc.)
Summary: Take user inputs for the introspection visualization: convolutions output layer activations, filters using containing phrases, and number of samples

Reviewed By: Mortimerp9

Differential Revision: D4603797

fbshipit-source-id: dc972dcb8ad36e30defab266d710e047b11cff73
2017-03-08 13:49:45 -08:00
046b467c9a added prefix to load op
Summary:
modified load_save_op to work with my training script

- SaveOp now correctly strips specified prefix of the form 'gpu_0/' when saving model blobnames to DB
- when translating DB blobnames to model blobnames, LoadOp can now optionally add prefix of the same form

Reviewed By: Yangqing

Differential Revision: D4664134

fbshipit-source-id: a2512e79f0c5172c5111af3e9b6fd161f268f4df
2017-03-08 12:48:50 -08:00
c2425fc9a1 Fix build warning for C file 2017-03-08 21:28:57 +01:00
4f0e7730a9 Distrubited Multi-GPU resnet50
Summary: Use filesystem rendezvous for dist-multi GPU training.

Differential Revision: D4664945

fbshipit-source-id: 7b6767323e94bc4e7fa25ef3eba65b38abb79341
2017-03-08 11:39:29 -08:00
8de1db9eb6 Implement recurrent attention in C2
Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later

Differential Revision: D4647837

fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068
2017-03-08 11:21:28 -08:00
f0d78753ae Make ModelExporter.load_from_db() load to specific workspace
Summary: In case of distributed task, load_from_db() loads to wrong workspace (when used inside a Python op). Passing which workspace to use explicitly so that it loads to the one Python op is being run.

Reviewed By: kennyhorror

Differential Revision: D4653692

fbshipit-source-id: 94585c012b05ee38b9ce5e8ef0efdd50aa41dd2b
2017-03-08 09:31:42 -08:00
fbcedf2da2 Merge commit '3d95e13b332e1b31d706b59c3b67f886958ece79' 2017-03-08 09:09:46 -08:00
3d95e13b33 Check event_count before merging blocks 2017-03-08 08:49:04 -08:00
228e1a8696 Add CUDA caching allocator accessor 2017-03-08 08:29:50 -08:00
be0e8c0009 Use sequential slot numbers from context
Summary:
Add a nextSlot() function to the context that increments and
returns a slot number. This enables multiple algorithms sharing the
pairs part of a context. The slot numbers were hardcoded before this
change, which prevented reuse.

After this change, some of the tests can be changed to run multiple
times (or do a parameter sweep) without respawning a new threadpool or
allocating new fixtures.

Also change some internally used variable names for more consistency.

Reviewed By: andrewwdye

Differential Revision: D4668268

fbshipit-source-id: 65cbc8f2666f0b7d2f1c72574b86d913f5855d62
2017-03-08 08:23:03 -08:00
3fa8a3ff46 add implementation of inclusive scan via upsweep-downsweep 2017-03-08 07:34:14 -08:00
e75221e316 Add eval net to two tower workflow
Summary: The evaluation part of the two tower workflow is missing. This diff is to complete it. Part of the newly added functions can be used for other workflows, eg, feed. As the eval workflow in different workflows will be overlapped, a generic eval workflow will be added in a separate diff.

Reviewed By: kennyhorror

Differential Revision: D4646880

fbshipit-source-id: 4d6eb35df10f6f613533d442f2a04dc0332386f8
2017-03-07 21:03:00 -08:00
8de2027d9b Add gradient operator for SumElements
Summary: Add gradient support for Caffe2 operator SumElements (for use in Translation RNN training pipeline).

Differential Revision: D4669036

fbshipit-source-id: 502760a2a624b20b3241e83a2f208f450b6ff36f
2017-03-07 20:03:07 -08:00
83437853ad refactor and modulize optimizers
Summary:
The current optimizer code in c2/python has the following issues:
(1) the optimizers in sgd.py cannot config per param-blob optimizer;
(2) sgd.py is a bad file name. optimizer.py is a better name;
(3) layer_model_helper.py has another set of optimizer code (which supports per param-blob optimizer)

This diff did the following
(1) create optimizer objects so that we can config per param-blob optimizer and that are also compatible to the existing optimizer code
(2) the new optimizer code are much more modulized
(3) move the optimizer code to file with better name (optimizer.py)
(4) replace the optimizer imports in the existing code

will do in next diffs
(1) optimizers with structured parameters for dper2
(2) get rid of the optimizer code in layer_model_helper.py

Reviewed By: salexspb

Differential Revision: D4609013

fbshipit-source-id: 2e2d6dfa8685d10498f89069157453d9feca3f27
2017-03-07 18:46:47 -08:00
235a95f09a Fix LengthsToRanges docs
Summary:
Fixing docs to reflect implementation - `LengthsToRangesOp::RunOnDevice()`
assumes the input is an `int32_t` tensor.

Reviewed By: ender-wieczorek

Differential Revision: D4626504

fbshipit-source-id: 5249a57efc6f62748c3c2ecdfaca61843830c44e
2017-03-07 18:46:46 -08:00
b5e5001426 update detailed build info
Summary:
Fun on the plane. This basically reveals the per-platform build status on the README.md file.
Closes https://github.com/caffe2/caffe2/pull/188

Differential Revision: D4668460

Pulled By: Yangqing

fbshipit-source-id: 242b916cca0a46f8d797c6430c1875d6ffaae7ce
2017-03-07 15:46:33 -08:00
4647f753bc Merge commit '0f872ed02fbaf5b326f235b3f18724171b061416' 2017-03-07 14:45:01 -08:00
ed693b1c6a add EnsureDense Op in MTML MLP
Summary:
1. Allow EnsureDense Op to do both in-place pass or copy
2. In MTML, add EnsureDense Op before gather
3. Change the unittest values (adding another operator changes the random seed,
  which causes the model initialization also changes)

Reviewed By: xianjiec

Differential Revision: D4625219

fbshipit-source-id: b3c748c3651d1dedd75420912a9698b7e46187c5
2017-03-07 14:03:49 -08:00
b599910f3a Use new metric intefaces in trainer workflows.
Summary: This diff is migrating existing DPER workflows to use new metric abstractions in training.

Reviewed By: xianjiec

Differential Revision: D4656576

fbshipit-source-id: 1b3b16b390fc0757480e41df1c4214c11cd76e8a
2017-03-07 12:46:52 -08:00
6830d56103 CodeMod: google::ProgramUsage to gflags::ProgramUsage
Summary:
CodeMod: `google::ProgramUsage` to `gflags::ProgramUsage`.

For gflags, `namespace google` is deprecated in favor of `namespace gflags`.

Automated with:

```lang=bash
hg grep -lw google::ProgramUsage | xargs perl -pi -e 's,\bgoogle(::ProgramUsage)\b,gflags\1,g'
```

Reviewed By: igorsugak

Differential Revision: D4665851

fbshipit-source-id: 9790c74f1c42d74043b94ee356f8e3cc3622f132
2017-03-07 11:47:35 -08:00
1741fd839f Re-apply windows diff D4657831
Summary:
(Note: previous revert was due to a race condition between D4657831 and
D4659953 that I failed to catch.)

After this, we should have contbuild guarding the Windows build both with
and without CUDA.

This includes a series of changes that are needed to make Windows build,
specifically:

(1) Various flags that are needed in the cmake system, specially dealing
with /MD, /MT, cuda, cudnn, whole static linking, etc.
(2) Contbuild scripts based on appveyo.
(3) For Windows build, note that one will need to use "cmake --build" to
build stuff so that the build type is consistent between configuration and
actual build. see scripts\build_windows.bat for details.
(4) In logging.h, ERROR is already defined by Windows. I don't have a good
solution now, and as a result, LOG(ERROR) on windows is going to be
LOG(INFO).
(5) variable length array is not supported by MSVC (and it is not part of
C++ standard). As a result I replaced them with vectors.
(6) sched.h is not available on Windows, so akyrola 's awesome simple
async net might encounter some slowdown due to no affinity setting on
Windows.
(7) MSVC has a bug that does not work very well with template calls inide
a templated function call, which is a known issue that should be fixed in
MSVC 2017. However for now this means changes to conv_op_impl.h and
recurrent_net_op.h. No actual functionalities are changed.
(8) std host function calls are not supported in CUDA8+MSVC, so I changed
lp_pool (and maybe a few others) to use cuda device functions.
(9) The current Scale and Axpy has heavy templating that does not work
well with MSVC. As a result I reverted azzolini 's changes to the Scale
and Axpy interface, moved the fixed-length version to ScaleFixedSize and
AxpyFixedSize.
(10) CUDA + MSVC does not deal with Eigen well, so I guarded all Eigen
parts to only the non-CUDA part.
(11) In conclusion, it is fun but painful to deal with visual c++.

Differential Revision: D4666745

fbshipit-source-id: 3c9035083067bdb19a16d9c345c1ce66b6a86600
2017-03-07 11:02:12 -08:00
d8588d8007 CUDA version of elementwise power + rename to Pow + gradient
Summary: Renamed ElementwisePower to Pow for better discoverability. Added CUDA version and Gradient + tests.

Reviewed By: kennyhorror

Differential Revision: D4665550

fbshipit-source-id: dd33d8ad3917d71504e363ab397af50d38a63b1f
2017-03-07 10:20:40 -08:00
7ba5e7cea1 fix VolumetricMaxPooling test instability (#952) 2017-03-07 10:55:46 -05:00
9b626a8047 Fix documentation - replace 'matrix' with 'vector' (#951) 2017-03-07 10:40:18 -05:00
bd0e9a73c7 Fix some simple build error on MacOS (#949)
Issue #948

Signed-off-by: Zhou Chang <achang.zhou@gmail.com>
2017-03-07 09:47:49 -05:00
695ea6c7a1 SumElementsOp
Summary: Add a simple op to sum the elements, with optional averaging. This is basically copy from AverageLossOp that we should alias to this. And maybe develop this towards a generic norm op.

Reviewed By: jhcross

Differential Revision: D4664591

fbshipit-source-id: 0e0c0efe9e415e2ad2feecfa42b03db2c83bee70
2017-03-07 05:23:53 -08:00
8fab453863 Sqr op and gradient
Summary: Due to popular demand, added an op to compute element-wise square + gradient for it (just for the fun of it).

Reviewed By: Yangqing

Differential Revision: D4664797

fbshipit-source-id: 0a29c7c249fdc72f51412bebd6ae352a7801cf05
2017-03-07 03:03:07 -08:00
560572910c Add task outputs and stop signals to net_printer
Summary: Useful for debugging of multi_reader.

Reviewed By: kennyhorror

Differential Revision: D4664954

fbshipit-source-id: ba7a307db444b61a7e520992ee44c35237906068
2017-03-07 01:21:40 -08:00
9f588aa8a2 Add Inference for Flatten
Summary: Implementing shape inference for Flatten operator and adding unit tests.

Differential Revision: D4664073

fbshipit-source-id: c54a269fc7633908fe4197682d27076ef97d9c22
2017-03-07 01:21:40 -08:00
7bddd586f7 Change PrefixStore to take a Store reference
Summary:
Taking ownership of a std::unique_ptr is a bit awkward. It's actually
useful to reuse the underlying store and create multiple prefix stores
against it.

Reviewed By: andrewwdye

Differential Revision: D4662354

fbshipit-source-id: eaf62f7d5a97d6ee848252ff3124c28da349f6f2
2017-03-06 22:19:49 -08:00
da10450535 Allow multiple input pointers to broadcast algorithms
Summary:
This changes the constructor prototype of the broadcast algorithms.
They now take the rank of the root process and the rank of the root
pointer. The root process now also broadcasts locally, among the
specified pointers, in addition to broadcasting to its peer processes.

The broadcast tests are made more robust to use a different value at
every index for every buffer, like the allreduce tests. To accomodate
multiple input buffers for CPU side algorithms, I added a Fixture
helper, and renamed the existing Fixture class to CudaFixture.

The broadcast tests contain a few TODOs since they don't vary the root
process or root pointer yet. I anecdotally verified this does work,
but didn't want to include the necessary changes to do so in this
commit (it requires some changes in rendezvous and NCCL code). A fix
for this is forthcoming.

Reviewed By: andrewwdye

Differential Revision: D4661635

fbshipit-source-id: c069e0d4e8f676a63efd74b15ea1156adcc09477
2017-03-06 22:19:49 -08:00
039c3cf0ba Revert D4657831: [caffe2][PR] Changes for Windows build to pass.
Summary: This reverts commit 070ded372ed78a7e3e3919fdffa1d337640f146e

Differential Revision: D4657831

fbshipit-source-id: 3a0fb403936a9257776d637ce3ba5dbd81e1119f
2017-03-06 21:02:36 -08:00
2b1cd919ce Update extending.rst (#933) 2017-03-06 23:23:14 -05:00
7b8c7b11d2 Changes for Windows build to pass.
Summary:
After this, we should have contbuild guarding the Windows build both with
and without CUDA.

This includes a series of changes that are needed to make Windows build,
specifically:

(1) Various flags that are needed in the cmake system, specially dealing
with /MD, /MT, cuda, cudnn, whole static linking, etc.
(2) Contbuild scripts based on appveyo.
(3) For Windows build, note that one will need to use "cmake --build" to
build stuff so that the build type is consistent between configuration and
actual build. see scripts\build_windows.bat for details.
(4) In logging.h, ERROR is already defined by Windows. I don't have a good
solution now, and as a result, LOG(ERROR) on windows is going to be
LOG(INFO).
(5) variable length array is not supported by MSVC (and it is not part of
C++ standard). As a result I replaced them with vectors.
(6) sched.h is not available on Windows, so akyrola 's awesome simple
async net might encounter some slowdown due to no affinity setting on
Windows.
(7) MSVC has a
Closes https://github.com/caffe2/caffe2/pull/183

Reviewed By: ajtulloch

Differential Revision: D4657831

Pulled By: Yangqing

fbshipit-source-id: 070ded372ed78a7e3e3919fdffa1d337640f146e
2017-03-06 20:03:37 -08:00
8e46a15605 add docs for set_printoptions to sphinx (#945) 2017-03-06 21:52:37 -05:00
2333ccadfb MaxOp for CUDA
Summary: Simple elementwise Max implementation for CUDA. Given N inputs, it will do N-1 pairwise maxes. I am not sure if it would be much better to iterate through all the inputs in the kernel, since this has better locality. We can also optimize later.

Reviewed By: Yangqing

Differential Revision: D4659953

fbshipit-source-id: 3a23b7fb3dbdf1d43bf3134ece03af4a791844dd
2017-03-06 16:46:53 -08:00
3e54601bab New approach to metrics.
Summary:
This diff is modifying the way we're specifying metrics from doing reporter, that should know all the blobs which is should access in advance, to reporter that is connected through schema.

This diff is also reporting any arbitrary number of learning curves to Flow and provides really flexible way to specify all the metrics we care about.

TODO: Modify model helper to allow providing intermediate results for reporting
TODO: Add evaluation net (instead of prediction net).
TODO: Move all other places in DPER 2.0 to use that abstractions instead.
TODO: Get rid of LogScoreEstimator in favor of metric that is going to be really suiting our needs.

Reviewed By: azzolini, dzhulgakov, kittipatv

Differential Revision: D4577548

fbshipit-source-id: 3515bd41e0f92263ff90ce2f7207abf65d01b1f7
2017-03-06 14:48:16 -08:00
f747bbec2e move the dper 1.0 utils to c2 or fb utils
Summary: so that the utils can be used by a wider range of audience.

Reviewed By: xianjiec

Differential Revision: D4637462

fbshipit-source-id: f0695f430902aef26360efa511069b3755eaf52a
2017-03-06 14:31:45 -08:00
15a9fbdedb Merge pull request #881 from colesbury/parallelize_backwards
Parallelize autograd backwards
2017-03-06 16:57:19 -05:00
6336300880 Fix bug where adding a hook could replace an existing hook.
We were keying hooks by RemovableHandle id. However, we don't hold onto
handles and ids of dead objects can be reused. This replaces id(handle)
with a global counter.
2017-03-06 12:47:53 -08:00
5073132837 Implement 'pre' and 'post' hooks at the C++ autograd level 2017-03-06 12:47:53 -08:00
65b66264d4 Improve broadcast/reduce performance by coalescing tensors 2017-03-06 12:47:53 -08:00
7472631e7f fix bug in Mean pooling
Summary: simple fix

Reviewed By: xianjiec

Differential Revision: D4655469

fbshipit-source-id: 6dbcfcd2f3f7f7bd74aca88af4f60c6ddffb9138
2017-03-06 11:31:10 -08:00
0f872ed02f Add THCCachingAllocator_recordStream()
This is similar to THCCachingHostAllocator_recordEvent() but on CUDA
allocations. It's useful for overlapping copies with computation. The
workflow is approximately:

  0. allocate dst tensor on copy stream
  1. copy from CPU to GPU on copy stream
  2. synchronize the main stream with the copy stream via
     cudaStreamWaitEvent
  3. THCCachingAllocator_recordStream(dst, main_stream)

The recordStream() call is necessary to prevent the dst tensor from
begin reused on the copy stream before the main stream finishes work.

Previously, you would need to insert a second cudaStreamWaitEvent before
dst is freed to force the copy stream to wait on the main stream.
2017-03-06 10:50:19 -08:00
c61a7ca777 Make counts datatype int. Used as index.
Summary:
To avoid Numpy warning: using a non-integer number instead of an integer will result in an error in the future
Closes https://github.com/caffe2/caffe2/pull/64

Differential Revision: D4658348

Pulled By: Yangqing

fbshipit-source-id: 3a1b33cbb27849bc167b08147d078e8d487567f4
2017-03-06 10:46:36 -08:00
9ef35f4a0b Add validation checks to load op
Summary: Added validation for load op when doing load_all by refactoring validation logic for loading specific blobs.

Reviewed By: kennyhorror

Differential Revision: D4641986

fbshipit-source-id: e0075a12188ca09d7628add72c143b40d5d9f382
2017-03-06 09:46:35 -08:00
761d6799be code syntax error in document (serialization.rst) (#937) 2017-03-06 10:06:04 -05:00
81d5461973 cuda check -> enforce
Summary:
In the past we have moved most of the CHECKs to CAFFE_ENFORCE (exceptions).
However, we kept the name "*_CHECK" for cuda calls, and that caused some
confusion especially in the destructor calls: while our destructors are not
written to handle exceptions, these CUDA_CHECKs could actually throw some
exceptions.

As a result, this diff

(1) Renames all cuda related "*_CHECK" to "*_ENFORCE"
(2) Explicitly marked the destructor of core Caffe2 classes as noexcept
(3) Added proper, really-CHECK cuda check macros, and used those in the
corresponding destructors.

This should not change any of existing functionality.

Reviewed By: dzhulgakov

Differential Revision: D4656368

fbshipit-source-id: 32e3056e66c0400156c5ca0187b6151cf3d52404
2017-03-05 22:46:22 -08:00
4030fbf535 Add _aligned_free if defined _MSC_VER in context.h
Summary:
In windows, it is necessary to use `_aligned_free` instead of `free` when using `_aligned_malloc` before.
Closes https://github.com/caffe2/caffe2/pull/184

Differential Revision: D4657929

Pulled By: Yangqing

fbshipit-source-id: 476a9b702a1ee37d5e16483087be2ccdc7bf4259
2017-03-05 21:17:53 -08:00
35f0c0b0fb Fix gflags build
Summary:
Our internal update of gflags in b0e325ce69 called for this change.
Closes https://github.com/caffe2/caffe2/pull/185

Differential Revision: D4657928

Pulled By: Yangqing

fbshipit-source-id: bdf9fdc63a16dafc28b690598463ec72e3c50f40
2017-03-05 21:17:53 -08:00
0d179aa8db Updated datasets.rst, combined all commits (#931)
Added MNIST in the docs

Updated incomplete cifar doc

Updated the datasets.rst to include all datasets
2017-03-05 17:38:28 -05:00
5b171ad7c2 remove misleading guide for BCELoss (#924) 2017-03-05 14:31:01 -05:00
ac9245aeb3 import numpy before setting dlopen flags (#928) 2017-03-05 14:30:13 -05:00
60736bdf99 fix corner case in kwargs for DataParallel (#930) 2017-03-05 14:27:52 -05:00
7d58765cee docs: Fixed example code bug in extending module doc. 2017-03-05 12:09:08 -05:00
76f7d749e4 bump version 2017-03-05 08:49:52 -08:00
0b7374eb44 add THCS to build_all flags 2017-03-05 11:32:43 -05:00
6fff764155 replace old select_compute_arch.cmake with new 2017-03-05 11:32:43 -05:00
8ced72ccb8 link THPP to THCS when CUDA available 2017-03-05 11:32:43 -05:00
b1ae7f90d5 Added functionality for data parallel table (#843) 2017-03-05 02:35:46 +01:00
aef75ca5dd Strip prefix of strip_prefix in blob names before save and load.
Summary:
- Replaces strip_regex implementation in SaveOp. It deletes the prefix of blob names upto a given substring.
- Adds the same functionality to LoadOp. Needed for loading checkpoints that are stored using the strip_prefix feature.
Closes https://github.com/caffe2/caffe2/pull/129

Differential Revision: D4512234

Pulled By: Yangqing

fbshipit-source-id: d926c1c5adcc7a711365cede11f21421bb7d4138
2017-03-04 15:46:47 -08:00
59ebbfb2bd cpu memory allocation reporter
Summary:
This allows one to report the CPU memory allocation over a Caffe2 run.
To enable, use --caffe2_report_cpu_memory_usage in the commandline arguments.
This has to happen before any Caffe2 allocation has taken place.

Reviewed By: salexspb

Differential Revision: D4641353

fbshipit-source-id: 13a4315f63154edad9e925bb5c276cad4fe78c70
2017-03-04 15:46:47 -08:00
8b61ee522e Merge commit 'aec182ae72d51dad0f46cdfe7ff9a41380d7da35' 2017-03-04 08:58:21 -08:00
76ca3eb191 Merge commit 'fea50a51ee2d9af15c42f785ab2232469357b557' 2017-03-04 08:58:02 -08:00
fea50a51ee reintroduce USE_AVX* for files which dont have -mavx* set 2017-03-04 08:55:43 -08:00
51e589ed73 fix critical bug in adds SSE implementation 2017-03-04 08:39:19 -08:00
2e87643761 remove fastmath for everything except simd/convolve 2017-03-04 08:16:47 -08:00
8caa7cec8d CUDA version of Log
Summary: As in the title. Simple registration issue.

Reviewed By: Yangqing, jhcross

Differential Revision: D4655691

fbshipit-source-id: 661e4d5f1226ec05e099c84f4454aa07c6be4449
2017-03-04 00:32:03 -08:00
ba9a85f271 fix bug introduced in #952 2017-03-03 21:00:05 -08:00
a22fd7194e More assertions for state change in TCP transport
Summary:
I have seen a stress run crash with unexpected state. Adding these
assertions will give more information when it happens again.

```
terminate called after throwing an instance of 'gloo::EnforceNotMet'
  what():  [enforce fail at gloo/transport/tcp/pair.cc:407] false. Unexpected state: 5
```

Reviewed By: andrewwdye

Differential Revision: D4652216

fbshipit-source-id: e787f4097f5ab32367dd9fa5a336d0389b97e955
2017-03-03 14:20:07 -08:00
0714d7a3ca set AVX/AVX2 flags only for specific files 2017-03-03 12:17:14 -08:00
fb7bafdd0f Update README.md
Summary:
Fix styling in README
Closes https://github.com/facebookincubator/gloo/pull/4

Differential Revision: D4651501

Pulled By: pietern

fbshipit-source-id: e2d4384ac94972f6c4fc03467564460ea4ce5c85
2017-03-03 11:40:02 -08:00
34ce58c909 Parallelize backwards 2017-03-03 11:26:00 -08:00
c238ee3681 Fix issues with lazy grad initialization (#912) 2017-03-03 14:23:51 -05:00
e1d7eaf7d8 Latency optimization tips
Summary: Closes https://github.com/facebookincubator/gloo/pull/3

Differential Revision: D4651203

Pulled By: pietern

fbshipit-source-id: 202afcbe26ec77ea93e48e72fea0d36f18b1b026
2017-03-03 11:05:17 -08:00
f5338a1fb8 compile AVX and AVX2 intrinsic code in separate files. Cleanup use of USE_AVX and USE_AVX2 macros in favor of __AVX__ and __AVX2__ 2017-03-03 10:30:18 -08:00
d96ad41191 cleanup TH CMakeLists and THGeneral.h of unused flags 2017-03-03 09:48:26 -08:00
f17cfe4293 sparse tensor operations (#735) 2017-03-03 18:37:03 +01:00
aec182ae72 Support half precision in baddbmm 2017-03-03 16:15:39 +01:00
8dff5a87f3 Change the type of content in BlobProto from string to bytes
Summary: We are converting MetaNetDef from thrift to protobuf. The protobuf is binary encoding. Since bytes is a superset of string. Change the field to bytes so that no warning is generated when compiling caffe2.

Reviewed By: Yangqing

Differential Revision: D4635581

fbshipit-source-id: 916b799e1fb9466658e1dd198bfb5c6928f22488
2017-03-03 07:15:34 -08:00
c93c884ee2 Add negative dimension to transpose and tests (#792) 2017-03-03 09:31:22 -05:00
c42a2d4d24 Fix dimension check for cat (#959)
* Use TH_INDEX_BASE when verifying dimension for cat

* Adding tests for cat when no dimension is specified.

- Also renamed ldimension to cat_dimension to be more specific.
2017-03-03 09:05:06 -05:00
f89252c336 Merge pull request #719 from twitter-forks/cat-fix
Fixes to cat
2017-03-03 09:04:06 -05:00
490c15fae9 Fix slicing with step (#905) 2017-03-03 09:00:14 -05:00
cdce8f0e52 update gflags
Reviewed By: yfeldblum

Differential Revision: D4646271

fbshipit-source-id: 5d21407e815588ae2b016001b859a4816851ab00
2017-03-03 00:47:24 -08:00
8c4310ac16 minor fix for _add_net_to_dict
Summary: fix a check if the net is net_dict

Reviewed By: kennyhorror

Differential Revision: D4647493

fbshipit-source-id: e0a62fc5847c99c85857c5635b4e39d59c66d5ce
2017-03-02 23:31:27 -08:00
7e3b572ca7 Document algorithm semantics
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4647587

fbshipit-source-id: a804e7479e6e2f511bfa59712b4b4a88bdf657e3
2017-03-02 21:35:28 -08:00
6c9105447c support fill bool tensors in GivenTensorFill
Summary:
the existing code uses vector<T> to store the given tensor and then copy to output.
If T=bool, vector<bool> stores the data as bits and then copy does not work.
we use TensorCPU to store it instead.
Also add unittest.

Reviewed By: kennyhorror

Differential Revision: D4622325

fbshipit-source-id: 95c27b5d1cfbc836d2419d01cacde5a3172f4d7e
2017-03-02 20:18:59 -08:00
b6fbc708f5 Verify InferShapesAndTypes() in operator unittests
Summary:
Verify shape and type inference in op unittests via assertReferenceChecks(). For now catch exceptions from InferShapeAndTypes() and log a warning.

TBD: Determine if there existing inference/output mismatches, and if so, change test asserts to warnings until they are resolved.

Differential Revision: D4639343

fbshipit-source-id: 605e72f53198e1a100fe7ba18b72c34c9ddbb727
2017-03-02 20:18:59 -08:00
5fbcd88102 Rename public member fields on gloo::Context
Summary:
The fields are public so their names should not end with an
underscore.

Reviewed By: andrewwdye

Differential Revision: D4645038

fbshipit-source-id: c12b47affbe511383a4722717a06abb61918473b
2017-03-02 19:49:45 -08:00
f2d72ba10f Revert "make handles to be thread-local"
This reverts commit 0720ba53b344809ce3d0bdfb1ea561afa5fe0646.
2017-03-02 17:48:24 -08:00
2108b42b92 Fix bug in cat when dimension is not specified.
- Code was using dimension specified which was negative
- Changed the cat_dimension variable to be more explicit
- Fixed code to use the cat_dimension variable
2017-03-02 16:14:09 -08:00
bae8df62d3 Add missing THCudaCheck around cudaMemcpy 2017-03-02 16:13:39 -08:00
a2b2880cc2 Remove underscores from public fields in NCCLContext
Summary: Remove underscores from public fields in NCCLContext

Reviewed By: pietern

Differential Revision: D4645857

fbshipit-source-id: 2c28a1c23d31097d685c0768dad9b99bbef7b171
2017-03-02 16:05:15 -08:00
70fc15c05c More documentation
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4644734

fbshipit-source-id: 50f5fadd2c5cd04e06a025f5538187ed852e669a
2017-03-02 15:50:37 -08:00
e9c0671132 Convnet benchmark cudnn_ws
Summary:
- Do not set default for cudnn_ws. Will use the default set by cuDNN ops.
- Do not use cudnn_ws for MLP.
- Do not run the benchmark if the required args are not set. Previously tried to run and errors out.
Closes https://github.com/caffe2/caffe2/pull/177

Differential Revision: D4633143

Pulled By: Yangqing

fbshipit-source-id: e89a7d01984e599d92a330d0ee4ba106feba65b8
2017-03-02 15:32:37 -08:00
98775b6bb4 Merge pull request #718 from killeent/templatize-scan
genericize PrefixSum --> PrefixScan via binary operator template parameter
2017-03-02 17:50:56 -05:00
b7cc2a501f genericize PrefixSum --> prefixScan 2017-03-02 14:31:27 -08:00
0720ba53b3 make handles to be thread-local 2017-03-02 11:10:49 -08:00
ff5fa11129 make mkl link to threaded version with GCC (#958) 2017-03-02 13:37:25 -05:00
837023bb4f Change benchmarks to support multiple input buffers
Summary:
The NCCL code used in CUDA-aware allreduce does local reduction of N
buffers prior to putting anything on the wire. Supporting this in the
benchmark tool to measure the impact under various configurations.

Other minor tweaks in this change:
* Specify sub-second iteration time
* Templatize allreduce benchmarks (the algorithms share a constructor
  prototype)

Reviewed By: andrewwdye

Differential Revision: D4639517

fbshipit-source-id: f7417d3e9f79278a3b1eca48d779f48b77e5260c
2017-03-02 10:16:39 -08:00
e88d241757 Cuda algorithms should return asynchronously if device streams are passed in
Summary: Cuda algorithms take an optional set of device streams to sequence operations. If streams are provided, the algorithms should enqueue final output buffer operations on the associated stream and return asynchronously. Destructors that allocate streams/events should synchronize before tearing down.

Reviewed By: pietern

Differential Revision: D4636447

fbshipit-source-id: 32ec2adc214c83b0b4bc0fff8993ab196459117b
2017-03-02 10:16:38 -08:00
ecb37e4439 Update tests to cover potential reordering problems
Summary:
With this change, every buffer gets assigned a different
value at every index. This means reordering of segments (e.g. in the
chunked algorithm) would surface as test errors.

Reviewed By: andrewwdye

Differential Revision: D4636368

fbshipit-source-id: 464eb1515d1590e12481961d427a92e2ebb3be82
2017-03-02 10:16:38 -08:00
0c88194807 CUDA documentation
Summary: CUDA documentation detailing high-level support for CUDA in gloo algorithms, usage of streams, and synchronizing memory management.

Reviewed By: pietern

Differential Revision: D4633120

fbshipit-source-id: d88e230c8dc82fe48cda0f401b61758fa4f07f2e
2017-03-02 10:16:38 -08:00
50e73a8313 Support synchronous mode in ibverbs transport
Summary:
Synchronous mode means using the calling thread instead of the device
thread for completion handling. Since this saves a context switch in
the critical path, this is very beneficial for low latency algorithms.

For example: the p99 of a 4-way barrier drops from 17us to 4us.

Reviewed By: andrewwdye

Differential Revision: D4626948

fbshipit-source-id: 013b1680497589fe5ad0bca38600bce6a410200b
2017-03-02 10:16:38 -08:00
fc7f026980 Refactor ibverbs transport to prepare for sync mode
Summary:
All pairs created by a device would use the same completion queue.
Supporting sync mode that way is difficult, as there is no way to
filter completions for a particular pair. This change refactors this
to use a single completion queue per pair so that this is no longer an
issue. This change is a preparation for supporting synchronous mode
(where the calling thread itself will poll the ibv library for
completions instead of the device thread).

This change also includes a refactoring of the way transient memory
regions are handled so that they are properly deregistered and
deallocated when no longer needed.

Reviewed By: andrewwdye

Differential Revision: D4625146

fbshipit-source-id: 21bf5ab321534fbd5c03f12049c10fc67da68944
2017-03-02 10:16:38 -08:00
9f18f83375 Downcase setMutex
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4626965

fbshipit-source-id: 2d32b07182202f65e673795aefacc6cc991d3c7c
2017-03-02 10:16:38 -08:00
9c114e6f1c Fix compile error
Summary: std::atomic was not defined for cuda.cu.

Reviewed By: andrewwdye

Differential Revision: D4624611

fbshipit-source-id: 973bba10026e065667d6a576055d00505ee02d62
2017-03-02 10:16:38 -08:00
0e78a59610 add mutex getter/setter to synchronize CUDA and NCCL ops
Summary: Allow gloo consumers to assign a mutex to synchronize CUDA malloc/free and NCCL operations.

Reviewed By: pietern

Differential Revision: D4622135

fbshipit-source-id: 60acd7c01a677a0df5415fe38e6ef5a2e7c8606a
2017-03-02 10:16:38 -08:00
5e7f5db332 add subset samplers (#888) 2017-03-02 09:26:10 -05:00
b5f7592140 boolean mode in module.train 2017-03-02 09:18:05 -05:00
f366e5fc81 Support int16 numpy conversions
issue #891
2017-03-02 09:15:57 -05:00
48f087f6ce C99 cleanup broke MSVC (#952)
* __pragma for MSVC.
2017-03-02 08:57:28 -05:00
73db5f902e Fbsync cudnn rnn fix
Summary:
Update cuDNN RNN interface (mostly fixing ordering of arguments). Set seed so that test can pass consistently
Closes https://github.com/caffe2/caffe2/pull/62

Reviewed By: Yangqing

Differential Revision: D4348966

fbshipit-source-id: f9b56be37739e5bffabec130e3407492b2aef656
2017-03-02 05:31:21 -08:00
ec56737190 fix shape inference for spatial softmax with loss
Summary: The shape inferenec did not check for spatial mode.

Reviewed By: andrewwdye

Differential Revision: D4638218

fbshipit-source-id: f15419738587013dea39e04a3da086890938c4e2
2017-03-01 19:32:32 -08:00
642b5a863f Adding changes that enable MSVC build
Summary:
MSVC 2015 has known bugs about template functions so these changes aim to fix them - no functional differences introduced.
Closes https://github.com/caffe2/caffe2/pull/179

Reviewed By: ajtulloch

Differential Revision: D4635241

Pulled By: Yangqing

fbshipit-source-id: a282a96e1e626e9440c1e3f3cb15b5b1fa710887
2017-03-01 16:47:58 -08:00
7fef264bfa Bumping version to 1.3.3 2017-03-01 16:44:27 -08:00
8996811936 Only enable peer access for ring neighbors.
This enables support for systems with more than 9 GPUs attached to a single PCIe root complex.
2017-03-01 16:42:38 -08:00
c219a183d0 Fix copy/paste typo in error message 2017-03-01 16:42:38 -08:00
8e1d6f9b60 Fix crash in Reduce when non-root ranks have invalid recvbuff 2017-03-01 16:42:38 -08:00
6cb63df704 Default LocalSession to current workspace.
Summary:
At the moment LocalSession creates a new workspace if none if provided. As a
result anything that have been executed in local session is not going to be
avaiable to the external caller, i.e. everything that is using SingleRunner can
only observe side-effects and not really access intermediate blobs.

This diff is modifying LocalSession to run in current workspace instead (unless
it gots some really weird effects because we rely on privateness of the
workspace it should work).

Differential Revision: D4634743

fbshipit-source-id: 975bed154c7ca215dc3fc0d60f05a7c092711482
2017-03-01 16:03:18 -08:00
7ad948ffa9 fix tests to not sys.exit(), also fix fatal error on THC initialization 2017-03-01 17:37:04 -05:00
3277d83648 Add Nesterov Momentum (#887) 2017-03-01 20:49:59 +01:00
2cddbc719c Euthanize a process with timeout
Summary: vigneshr has been experiencing randomly that the process does not exit in the end. We don't know what causes this, so this will help with two ways: (1) by putting timeout_guard.EuthanizeIfNecessary(600) in the end of the operator, you ensure that the process is killed in 10 minutes, allowing for retry; (2) this killing will cause python stack traces to be dumped, helping debug the real issue.

Differential Revision: D4635781

fbshipit-source-id: b558418c80671c00effdd514e4ddc01e935c95df
2017-03-01 11:38:11 -08:00
2f68632a32 Add SparseNN workflow for feed.
Summary: Add SparseNN workflow for feed. I haven't fully thought about the change needed for ads, as I added a property called 'preproc_output_schema' for LayerModelHelper.

Reviewed By: xianjiec

Differential Revision: D4585796

fbshipit-source-id: 060d08f4beb928e7e7863f2e563f612c358951fb
2017-03-01 11:02:38 -08:00
1487278fdf Allow backprop through cuDNN RNN in eval mode
Handling of dropout descriptors has been improved too.
2017-03-01 19:42:39 +01:00
977630bc15 Handle duplicate backward roots in autograd 2017-03-01 19:42:39 +01:00
12efd53dba ConstantPad2d and F.pad (#856) 2017-03-01 19:39:44 +01:00
37e05485d9 added initialization schemes in torch.nn.init (#833) 2017-03-01 19:34:13 +01:00
c76770f40e Merge commit 'dfca8dfdc5988813ed5673589ffa4fdd1c4f3d2d' 2017-03-01 09:29:51 -08:00
da725830c2 Add support for variable length sequences in RNNs (#873) 2017-03-01 17:36:32 +01:00
fc6fcf23f7 Lock the cudaFree mutex. (#880)
Prevents NCCL calls from overlapping with cudaFree() which can lead to
deadlocks.
2017-03-01 11:29:25 -05:00
aa3156c235 Remove use of logging module and np.random.randint() due to deadlocks with forks
Summary: See http://bugs.python.org/issue6721. Since everstore loaders use ProcessPoolExecutor, which is based on forks, and there was perhaps update of the numpy library or some unralted lirbary, we started getting subprocesses stuck at np.random.randint().   Also changed logging to prints, since logging is known to have issues with multiprocessing.  See https://www.prod.facebook.com/groups/fbpython/permalink/1438647216176641/

Differential Revision: D4633725

fbshipit-source-id: ae948a1827c71a3a2119d6a3248706728984df31
2017-03-01 03:32:56 -08:00
b190f1b5bc Add another pinned memory test.
Checks that pinned memory freed on a different GPU from which it was
allocated isn't re-used too soon.
2017-03-01 12:22:31 +01:00
02937903cc add inference for gradient ops + a couple of missing shape inference functions + fix to scalars
Summary:
A bit too much stuff in one diff, so sorry:

1. Add inference for gradient types by using the fact that x_grad is gradient of x and must be of same shape. This is kind of awkward to use string matching, but in addition I rely on the operator being actually a gradient op.
2. dzhulgakov was write, scalar shape is () and not (1). Sorry, my claim easlier was #fakenews.
3. Added inference functions for MakeTwoClass, MomentumSGDUpdate and Cross entropy ops.

Reviewed By: dzhulgakov

Differential Revision: D4569758

fbshipit-source-id: 0db13f33819777fdddefe21d4b1ebf906fcaf98c
2017-02-28 23:33:32 -08:00
f84e5360cc LSTM benchmark (Caffe2 RNN based)
Summary: Just generate some random data and put it through LSTM (Cafef2 RNN based) using its own output as gradient value for benchmark purposes. With default parameters it fits my dev GPU memory. On default parameters provided in this diff I have got 300k entries per second processed. These entries are split into blocks of seq_length * block_size. Each entry is of size hidden_dim, LSTM takes in hidden_dim sized input and produces output of the same size.

Reviewed By: salexspb

Differential Revision: D4605815

fbshipit-source-id: dd529302a0a93e8711784c67e4c777c8d6a8cdf4
2017-02-28 23:17:26 -08:00
8a0ebed4c9 Caffe2: Tile operator
Summary: Caffe2: Tile operator

Differential Revision: D4630698

fbshipit-source-id: 1aa5c3c9d7fcfc17f78c80fd4b752595280266a0
2017-02-28 23:17:26 -08:00
bdd542d087 backup functions for non-cuda cases
Summary: This fixes the error introduced in cudnn v6 diff.

Reviewed By: ajtulloch

Differential Revision: D4633113

fbshipit-source-id: 454cd4b3e52b8de01c1914e66d25310d7ecb13aa
2017-02-28 22:07:54 -08:00
69fa85be26 Fix some typos
Summary:
Found while reading through d522693cc8
Closes https://github.com/caffe2/caffe2/pull/176

Differential Revision: D4630275

Pulled By: Yangqing

fbshipit-source-id: 0a8e85d317d427a39467ebcb5e9a70594075bae2
2017-02-28 18:36:12 -08:00
fbf47a8825 Cudnn v6
Summary:
Add cudnn v6 support, including testing support for dilated convolution.
Add a check to ensure that the versions of cuDNN used to compile Caffe2 and run it are compatible
Closes https://github.com/caffe2/caffe2/pull/85

Reviewed By: bwasti

Differential Revision: D4387690

Pulled By: Yangqing

fbshipit-source-id: 312960134398dd4afe6ee0c01cdc160046c904e8
2017-02-28 17:46:33 -08:00
dfca8dfdc5 ensure valid index in multinomial 2017-02-28 14:48:48 -08:00
b46d5e0b04 Fix NN bindings 2017-02-28 14:35:38 -08:00
f19a11a306 Merge commit '8e8022b7351401911e10b94aeb5ae35d32907705' 2017-02-28 14:35:20 -08:00
cfcf69703f Merge commit '80429ad9f7c4775f7f88344a2cf037e499f060b8' 2017-02-28 14:35:00 -08:00
e22b8e0d17 Merge commit '3cc89afde68a831434f3abe9e3af2ac0b134215e' 2017-02-28 14:34:44 -08:00
fbfba6bdca Merge commit '6ff77503645da59eeca5be473a1902e523c4adb3' 2017-02-28 14:34:29 -08:00
3cc89afde6 Merge pull request #713 from killeent/multinomial-indexing-fix
fix indexing bug in sampleMultinomialOnce
2017-02-28 17:13:44 -05:00
1e4aee057c Merge pull request #712 from killeent/multinomial-fixes
Fix sampleMultinomialOnce to better handle large distribution values
2017-02-28 17:12:48 -05:00
8dfcf7e35a Merge pull request #709 from colesbury/pinned_memory
Fix bug where pinned memory event could be recorded on incorrect device
2017-02-28 16:56:21 -05:00
76de151ddd Fix bug where pinned memory event could be recorded on incorrect device 2017-02-28 13:48:56 -08:00
2676cc46c2 fix indexing bug in sampleMultinomialOnce 2017-02-28 13:40:15 -08:00
1c92e85dae Added editDistance helper to caffe2 operators
Summary: Added editDistance helper to caffe2 operators

Differential Revision: D4622152

fbshipit-source-id: 4d6246b8226c1283d5883edfaa27e8f7748fdc4c
2017-02-28 13:31:56 -08:00
1bf7bc9768 refactor sampleMultinomialOnce to use <real, accreal>, assertion for sum overflow 2017-02-28 12:46:12 -08:00
3c41c9fe46 Add AutoGPU RAII that doesn't depend on Python API (#875)
Separates out non-Python part of AutoGPU. This also compiles without
CUDA which is useful for generic tensor code.

Also fixes a bug where THCPAutoGPU may not always switch the device:

  THCPAutoGPU guard(-1);
  guard.setDevice(0);
  guard.setDevice(1);
  guard.setDevice(0);  // would not switch batch to 0
2017-02-28 14:39:20 -05:00
000db87bc7 Half-floats support for the rest of segment ops
Summary:
previously fp16 type was supported in SparseLengthsSum operator, now it
works in all other segment operator as well.

Reviewed By: dzhulgakov

Differential Revision: D4624312

fbshipit-source-id: c9d72110e3762167270bb088405eaf9c56e88493
2017-02-28 11:19:15 -08:00
6ff7750364 add TH_TENSOR_APPLY variants for optimized redux (+refactor) 2017-02-28 10:30:31 -08:00
4d25c3d048 address comments and add tests 2017-02-28 10:23:36 -08:00
267b7ade50 Speed up reductions on non-contiguous dimensions 2017-02-28 10:23:36 -08:00
e30e94cb71 Made CNMEM optional and added a few cmake components
Summary:
(1) Since cub seems to be a better memory pool I made cnmem optional.
(2) Added MKL testing since Intel now provides an apt source, but that doesn't seem to work right now.
(3) Added cmake file for nervana gpu.
Closes https://github.com/caffe2/caffe2/pull/175

Differential Revision: D4627056

Pulled By: Yangqing

fbshipit-source-id: 9676fa32fce2a29574c0bf7e9d31660b5535cb51
2017-02-28 10:16:49 -08:00
b732f347ba Fix minor bug related to pinned memory allocator
Summary: Came across while reading something. Missing return statement.

Reviewed By: pietern, dzhulgakov

Differential Revision: D4626160

fbshipit-source-id: 4811b9c720510c76d3aadd93cee00f342f6552de
2017-02-28 09:32:21 -08:00
80429ad9f7 THVector_(add) -> THVector_(adds) 2017-02-28 12:20:44 -05:00
5ca6516ecb THVector_(add),(mul),(div) -> (adds),(muls),(divs) 2017-02-28 12:10:47 -05:00
ffa2f77a82 Remove vectorization TODOs where not needed
Summary: Remove TODOs where vectorization with Eigen is not needed, based on D4565679 feedback.

Reviewed By: ajtulloch

Differential Revision: D4623239

fbshipit-source-id: c949ee9bc295e87a87c333d68d958f0abfa71fd4
2017-02-28 08:36:14 -08:00
a3726759c6 Add a way do describe layers in a more AdHoc manner.
Summary:
This diff is trying to address one of the concerns that Xianjie have had - requirements create a layer for all operators and attach pass shapes and other info around.

The basic idea of the diff:
1. Try to create a layer with a given name, but if it's not available try to fallback on operator with that name (that is expected to have no parameters).
2. For all operators that we're adding through this functional style of creation - try to use C2 Shape/Type inference logic to get output type. If we fail to get - it just return untyped record and expect user to annotate it when it's really needed.

Reviewed By: xianjiec

Differential Revision: D4408771

fbshipit-source-id: aced7487571940d726424269970df0eb62670c39
2017-02-27 23:30:39 -08:00
851cb7059d changed StringfyProto to StringifyProto
Summary: Closes https://github.com/caffe2/caffe2/pull/155

Reviewed By: dzhulgakov

Differential Revision: D4621607

Pulled By: Yangqing

fbshipit-source-id: ec7f45132260fbb6d36ef61ffbf5bf6466f237eb
2017-02-27 23:05:04 -08:00
d85ca8c6df Do not initialize BN params if init_params is false.
Summary:
If init_params is False, the parameters should not be initialized.
This is particularly important when testing a model that provides values for these BN parameters.
Closes https://github.com/caffe2/caffe2/pull/174

Differential Revision: D4621791

Pulled By: Yangqing

fbshipit-source-id: 518443925990a12c1d5729b0971ebe19ba5d8998
2017-02-27 20:19:03 -08:00
7b0126381c Share queue + reduce logging
Summary: It is better for the workers to share the python-side queue, since I saw a case where workers assigned for one GPU was lagging behind others. Also, reduced logging as requested by rpenggithub.

Differential Revision: D4620487

fbshipit-source-id: 73353f9570b07788c8cd71c9fec9308cd93a44dd
2017-02-27 19:38:45 -08:00
67f94557ff Expose torch.HalfTensor 2017-02-27 19:35:47 -05:00
61bd5a0643 [Lint] Address F811 2017-02-27 19:33:00 -05:00
748d011c8b [Lint] Address F812 2017-02-27 19:33:00 -05:00
5d5cfe2e57 [Lint] Address E731 2017-02-27 19:33:00 -05:00
7cbe255296 [Lint] Use flake8 instead of pep8 2017-02-27 19:33:00 -05:00
4ef303698c Merge pull request #711 from gchanan/getDeviceAllocator
Add getter for cuda device allocator.
2017-02-27 19:29:39 -05:00
e2acf0f95b Vectorize rmsprop_update using Eigen
Summary: Replace for loop with Eigen operations in method rmsprop_update

Reviewed By: ajtulloch

Differential Revision: D4620691

fbshipit-source-id: 89cd570ecdf56a1255be4a0959ee711addc9696b
2017-02-27 16:03:14 -08:00
83e8b3f6c3 Add getter for cuda device allocator. 2017-02-27 15:44:44 -08:00
502ebed796 Fix one more reference cycle and ensure correct flag propagation (#868) 2017-02-27 18:38:29 -05:00
68ff58d771 Expose a mutex that is held around cudaFree() calls.
NCCL can deadlock if cudaFree() is called while it's launching kernels.
This exposes a mutex that can be held to prevent cudaFree() calls in the
caching allocator.
2017-02-27 15:08:30 -08:00
969c1602e6 Add Tensor::copy() to THPP
For now, this only supports copying from the same type. We can add
polymorphic copying in the future.
2017-02-27 21:33:40 +01:00
07623e24c9 Implement shape inference function for Im2Colop
Summary: Inference function for the Im2ColOp: caffe2/caffe2/operators/im2col_op.cc.

Differential Revision: D4608663

fbshipit-source-id: d26ffb403c2acb7a5ead5f58f044ee3340c8311a
2017-02-27 10:46:54 -08:00
1f537fe7d6 Vectorize ElementWiseDivide using Eigen
Summary: Replace for loop with Eigen operations in method ElementWiseDivide

Reviewed By: Yangqing

Differential Revision: D4602516

fbshipit-source-id: 6b19de8190d5e29ffe52359d0cd0c27cf03c52e2
2017-02-27 10:46:54 -08:00
88b7f8ffd5 Fix memory pool implementation
Summary:
The memory pool implementation was written back in the days when I only had
one GPU, and as a result I overlooked the fact that:

(1) CNMEM needs to have the same current device for the allocation and
deallocation to take place correctly.
(2) cub needs the device id of the pointer passed in for proper deallocation.

As a result, since C2 right now switches contexts very frequently, I added a
global map to keep record of the pointer affiliations, and use that for
deallocation when we are at another context.

I have not tested the speed but assuming that std::unordered_map is not too bad
this should be fairly fast.

Differential Revision: D4617300

fbshipit-source-id: e8bb366616cd93504e7d68b7f999011cd49caba5
2017-02-27 10:46:54 -08:00
4b52cbe636 turn off deprecation warning if glog needs so
Summary:
This addresses #162 for thatguymike
Closes https://github.com/caffe2/caffe2/pull/172

Differential Revision: D4620982

Pulled By: Yangqing

fbshipit-source-id: df3ef45f2c95418c538baa65d5dde3755cb25d1c
2017-02-27 10:07:58 -08:00
449f8997ab close blobs queues when stopping + test
Summary:
Mysterious deadlocks after epoch has finished have occured randomly but quite frequently recently for myself, vigneshr and others. Looking at a stack trace of vigneshr's job (P57129798), I noticed a couple of threads were calling BlobsQueue.blockingWrite (or something like that). That call stucks when the caffe2/c++ side queue is at capacity (we use capacity of 4 with data workers). So in cases when this call was just being made while the script was to be terminated, the thread did not close and the whole process did not close either (not completely sure why that is since thread is a daemon thread, but this might be a flow-related issue since we run inside a flow container).

This is quite easy to fix: just call CloseBlobsQueue() when terminating the process. I modified coordinator.stop() and wait_for_finish() to return a status code based on whether threads that were joined actually closed within the 1.0sec timeout. This allowed creating an unit test to test for this issue. Before my change, the unit test failed.

Reviewed By: pietern

Differential Revision: D4619638

fbshipit-source-id: d96314ca783977517274fc7aadf8db4ee5636bdf
2017-02-27 10:07:57 -08:00
2d4d3b18dd Use NCCL operations in AllreduceChunked
Summary: The AllReduceChunked algorithm currently performs the local reduce/broadcast of local device buffers in host memory. This diff updates the algorithm to execute the local reduce/broadcast steps using NCCL operations before copying a single device buffer to/from host memory.

Reviewed By: pietern

Differential Revision: D4587441

fbshipit-source-id: 4de689f59a6cf898b8eecd3c3b9f57f77124c0e3
2017-02-27 09:59:29 -08:00
97f95bb247 mpi const cast
Summary: This fixes https://github.com/caffe2/caffe2/issues/160

Reviewed By: pietern

Differential Revision: D4617278

fbshipit-source-id: 6fbc7727d62915cfe0426b528d707756580e7b78
2017-02-27 09:46:31 -08:00
d0e1f5f344 fix summarize op
Summary: This aims to fix https://github.com/caffe2/caffe2/issues/168

Reviewed By: pietern

Differential Revision: D4617272

fbshipit-source-id: 1b4952757f73d9a6cbab7c372d8ba84c9741b124
2017-02-27 09:31:49 -08:00
5e1d6a3691 Update functional.py (#862)
Fixed documentation error in conv3d
2017-02-27 10:42:02 -05:00
533cfc0381 Minor fix of docs of ModuleList and ParameterList (#861) 2017-02-27 10:09:54 +01:00
2b23712dc3 Improve autograd memory usage (#859) 2017-02-26 22:37:26 -05:00
88275da5e8 CUDA documentation tweaks (#858) 2017-02-26 20:37:43 +01:00
bd7a5ad6f0 Make Optimizer.load_state_dict use __setstate__ 2017-02-26 20:02:42 +01:00
1f6f82dbcf Fall back to indexing compatible with numpy 2017-02-26 20:02:42 +01:00
1f8939937a Allow using expand to broadcast tensors 2017-02-26 20:02:42 +01:00
b3d41a5f96 Add docs for ModuleList and ParameterList 2017-02-26 20:02:42 +01:00
fec2d493a9 Reshape grad_output in basic ops 2017-02-26 20:02:42 +01:00
86ee75f63f Fix for Long and Byte tensor indexing of Variables 2017-02-26 20:02:42 +01:00
31941918cf Prevent creation of reference cycles with leaf Variables that don't require grad
Also, raise an error immediately, if a leaf that requiers_grad is
modified in-place. Some comments were updated too.
2017-02-26 20:02:42 +01:00
19a65d2bea Expose stateless methods for torch.cuda.HalfTensor 2017-02-26 20:02:42 +01:00
819d4b2b83 Add finite differences gradcheck (#851) 2017-02-26 08:35:24 -05:00
b87c113cf4 CUDA documentation enhancement and docs versioning (#848)
* Add more detail to CUDA documentation

Also adds better cross-linking to the pages that discuss relevant topics.

* Adds recommendation to torch.save docs

* Make the version numbers for the docs dynamic

Might need tweaks for beta, 1.0, etc.
2017-02-26 08:33:26 -05:00
b25182971f readme change for getting clarity on binaries 2017-02-26 07:52:13 -05:00
1ee2c47e37 Correcting the description of LSTM attributes (#854) 2017-02-26 13:30:55 +01:00
21c40c1a3c Provide ability to specify more types for ConstantFillOp
Summary:
It looks like for most of the types there is not way we can get them (except
the results of operation on top of some other tensor), that was pretty
unfortunate for cases when we want to do partial type inference (I was trying
to do so in D4408771).

This diff is adding more possible types for ConstantFillOp. Please let me know
if I'm missing anything. The only part that worries me a bit - possible
GetArgument with types that support only subset of range (but it looks like it
can happen even now for i32 vs i64).

Reviewed By: dzhulgakov

Differential Revision: D4611482

fbshipit-source-id: 77917fd5e1d18a1b860e022ede4518143d0f3f26
2017-02-25 22:48:36 -08:00
2dc563f1f1 Fix indexing when passing only an Ellipsis 2017-02-25 23:34:09 +01:00
04d02632e9 instance norm test fix
Summary:
Reduce test input size to instance norm gradient check.  Larger size is currently timing out on stress tests.
e.g. failed: Timeout: Ran out of time before finding a satisfying example for test_instance_norm_gradients. Only found 2 examples in 125.39s.

Reviewed By: Yangqing

Differential Revision: D4608828

fbshipit-source-id: ce17a3ad28752d808efcbf79f1ea4238e63fb005
2017-02-25 14:31:42 -08:00
15ba71a275 Rebase fixes 2017-02-25 17:14:52 +01:00
e5b3fc49d6 Implementation of the 3rd set of tensor functions 2017-02-25 17:14:52 +01:00
ae1766951d Link TH and THPP to THD (#57)
* Fix THD library build

* THPP dependency added

* Minor cleanup; Fix build on OSX
2017-02-25 17:14:52 +01:00
02d08dafd9 Add support for IPv6 in Data Channel TCP (#53) 2017-02-25 17:14:52 +01:00
13a5090695 Added a size change in MaxPool1d module and improved tests (#771) (#832)
Backend is SpatialDilatedMaxPooling, so change 3D input (N*C*L)
to 4D size (N*C*1*L). Then output indices will range from 0 to L.
This range will not cause UnMaxPool1D error.

Signed-off-by: Zhou Chang <achang.zhou@gmail.com>
2017-02-25 08:53:30 -05:00
1d26baa0fc use CMAKE_SYSTEM_NAME instead of LINUX
Summary: Closes https://github.com/caffe2/caffe2/pull/170

Differential Revision: D4617063

Pulled By: Yangqing

fbshipit-source-id: cec9bc3f2f7324fd0281e92fab3d96e2cd4ed9e7
2017-02-24 19:47:41 -08:00
8e32e4c04c make wrap_generic_function importable 2017-02-24 14:27:54 -08:00
cf991310c3 c++ virtual function fix 2017-02-24 13:22:44 -08:00
8ab13eea6f delete redundant comment lines.
Summary: delete redundant comment lines.

Differential Revision: D4600596

fbshipit-source-id: 4bb619f9ff99d6f799e87970b6b6d5ea7de02c98
2017-02-24 11:04:36 -08:00
a8e7d922a6 increase QPS to 470K (from 250K or so)
Summary:
(Stacked with D4553941). Using the new net type increases QPS to 470K, close to Torch numbers (there are other optimizations that need to be done, particularly the log-estimator). Previously, QPS was close to 250K. This was when having reuseData=true.

Includes a small bug-fix to the new net type.

Differential Revision: D4594704

fbshipit-source-id: 21e7b0ca4173b036f45d3ba95c218792b31e7398
2017-02-24 10:46:51 -08:00
938706099e adding environment flags to disable SIMD codepaths 2017-02-24 07:35:11 -05:00
b257fd8e83 Other places that may need NameScope
Summary:
For code in layer model helper, layers. It's intentionally to not have NameScope by default.

This looks another place that may need default NameScope.
https://fburl.com/wdwtxp0m

Reviewed By: kennyhorror

Differential Revision: D4606971

fbshipit-source-id: b560bf59d3242e3f9443cd5aeda5c7e2e4e89079
2017-02-23 21:16:35 -08:00
aa875869dc Added more summary information for debugging python versions
Summary: Closes https://github.com/caffe2/caffe2/pull/167

Reviewed By: Yangqing

Differential Revision: D4610416

Pulled By: bwasti

fbshipit-source-id: 0f56941bed2a75105787e518a71638916e4d503f
2017-02-23 19:46:39 -08:00
9eeeb8407f use CUDA version of AccuracyOp with top_k=1
Summary: D4348953 added support for accuracy for top_k>1, which is only supported on CPU, requiring data to be copied to CUDA. But that diff did not take into account that we have top_k=1 version of AccuracyOp for CUDA. This diff ensures we use the CUDA version for top_k=1.

Differential Revision: D4607767

fbshipit-source-id: 8becda23890343043eb79ad04e4c6196e9010f0c
2017-02-23 19:02:53 -08:00
182c168285 Add group collector limit and add option for enable sum loss
Summary: as title. Add num of examples limit for group collect. Add option for enabling sum loss in BatchLRLoss

Reviewed By: xianjiec

Differential Revision: D4602311

fbshipit-source-id: 5b2a244f1f0e9f1ab0f4590e94828fd18d018d8d
2017-02-23 15:03:22 -08:00
cd4ea42048 Allowing creation of random odd length arrays in RandGaussian
Summary: curandGenerateNormal can only generate arrays of multiple of 2 lengths. MSRAFill and GaussianFill operators use RandGaussian utility method which in turn uses curandGenerateNormal. This is a test which runs the operators on both devices to generate odd sized random arrays.

Differential Revision: D4602819

fbshipit-source-id: e65f5c731e925886cfa14afff482f7053bd020a0
2017-02-23 15:03:22 -08:00
0a060dae50 better killing after timeout, cleanup
Summary:
This fixes at partly a recurrent problem when using everstore data input (or any other data input with multiprocessing). If the main process dies violently, the child processes are not killed. One cause for this was when using the TimeoutGuard(), as it called os._exit(1) that prevents any cleanup happening. I changed it to send SIGINT signal to the PID, and if in 10 secs the process is still living, calling os._exit(1). In my tests, this works well.

Did some other cleanup:
- improved logging of inputs/sec in data_workers
- removed redundant atexit() handling as the multiprocessing pool does it itself

Differential Revision: D4602550

fbshipit-source-id: 64d4526a2a3625d163d23f078286e719d56998f4
2017-02-23 13:16:19 -08:00
3330287dc7 Update dataloader.py (#837) 2017-02-23 14:38:41 -05:00
8a85d6bd34 support vectors with different dims in for DotProductOp.
Summary:
Add two argument to DotProductOp operator, `force_same_dim` (1 if we want
DotProductOp to only accept two tensors with equal dimension, 0 otherwise) and
pad_value (only useful when force_same_dim = 0, pad the tensor with smaller
size to the same as the other one).

Differential Revision: D4502619

fbshipit-source-id: 46f7da710c6f6365f76a7af6234c34c7f656be62
2017-02-23 11:09:07 -08:00
38c8520adf adding unsqueeze to docs 2017-02-23 12:13:25 -05:00
4a53ab3cb6 LSTMWithAttention implementation in Caffe2
Summary:
Implementation of ##LSTMWithAttention##

Still TBD:
1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting
2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention

Differential Revision: D4298735

fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5
2017-02-23 04:08:34 -08:00
492e1746af Fix THFree in THTensorApply 2017-02-23 06:01:13 -05:00
91a8109cfd Use C99 for openmp cleanup 2017-02-23 06:01:13 -05:00
161490d34a Add memcpy copy 2017-02-23 06:01:13 -05:00
9c302852eb comments fix 2017-02-23 06:01:13 -05:00
8654fcfd60 THVectorDefault style fix 2017-02-23 06:01:13 -05:00
b3d527d9a0 Tab style fix 2017-02-23 06:01:13 -05:00
4d495218c9 THTensorApply3 contiguous optimizations 2017-02-23 06:01:13 -05:00
13a041284c THTensorApply2 copy optimization 2017-02-23 06:01:13 -05:00
c60c1a003d TH_TENSOR_APPLY2 contiguous optimization 2017-02-23 06:01:13 -05:00
97add1a5ea comment fix 2017-02-23 06:01:13 -05:00
ca02930e47 Fill bug fix 2017-02-23 06:01:13 -05:00
20d5e95077 THTensorApply3 compress counter 2017-02-23 06:01:13 -05:00
eb4a7dc11d THTensorApply change dims to sizes 2017-02-23 06:01:13 -05:00
f722498b72 THTensorApply2 counter compress 2017-02-23 06:01:13 -05:00
aadfb6fe83 THTensorApply reduce memory overhead 2017-02-23 06:01:13 -05:00
6c273594c9 THTensorApply Counter compress 2017-02-23 06:01:13 -05:00
e475c82fa1 Add isTransposed judge and enable multithread of fill functions 2017-02-23 06:01:09 -05:00
0c2e6665df Add AVX copy 2017-02-23 05:50:34 -05:00
6295e6e94b Rebase master 2017-02-23 05:50:34 -05:00
670a4aa708 Fix AVX2 bugs 2017-02-23 05:50:34 -05:00
1bdc2e64ed Add fma cadd 2017-02-23 05:50:34 -05:00
c587be1e50 Add THVector Fill 2017-02-23 05:50:34 -05:00
bd481596f5 optimize THVector add mul div 2017-02-23 05:50:34 -05:00
a504d56b43 Fix THVector cmul AVX bug 2017-02-23 05:50:30 -05:00
91c4dfccea Use THVector cadd AVX 2017-02-23 05:46:44 -05:00
27f618c44d Add THVector Fill AVX 2017-02-23 05:46:44 -05:00
a14482a1df Add THVector cadd AVX 2017-02-23 05:46:40 -05:00
aa50c5734b Add THVector AVX cmul 2017-02-23 05:46:07 -05:00
293001a4fe Add THVector SSE div cdiv 2017-02-23 05:46:07 -05:00
638cfdf150 Add SSE add 2017-02-23 05:46:07 -05:00
5f80a14525 Separate SSE and AVX 2017-02-23 05:46:07 -05:00
1342fd3975 Remove THTensorMathSIMD THTensorMathDispatch 2017-02-23 05:46:07 -05:00
8d4af38489 Add THVector div cdiv 2017-02-23 05:46:07 -05:00
575a064e66 Remove THVector diff 2017-02-23 05:46:07 -05:00
3ab21a3c4f Merge THVector mul AVX 2017-02-23 05:46:07 -05:00
2f592e6c7d Remove THVector scale 2017-02-23 05:46:07 -05:00
5661ffb766 Merge THVector mul 2017-02-23 05:46:03 -05:00
9b74503daa Merge THVector cmul 2017-02-23 05:40:33 -05:00
24848f1cd8 Change THVector mul to cmul 2017-02-23 05:40:33 -05:00
a31a07ede9 Merge THVector add 2017-02-23 05:40:33 -05:00
c8c4c9b23d Change THVector add to cadd and fix NEON 2017-02-23 05:40:33 -05:00
e1ed9303f0 Add multi-thread add 2017-02-23 05:40:33 -05:00
a43aab13c2 Fix THTensorMath.c style 2017-02-23 05:40:33 -05:00
c698b4a45e Add Dispaches for div and mul 2017-02-23 05:40:29 -05:00
c6a0ffab50 Add AVX single float and double float add 2017-02-23 05:40:24 -05:00
8ba7cc30d1 Add THTensorMathSIMD.c 2017-02-23 05:32:34 -05:00
61bf08ca24 Fix compilation for simd tensor add 2017-02-23 05:32:28 -05:00
6ada3c0c16 Fast floating point add kernel in intrinsics (11x speedup over default for 10k elements) 2017-02-23 05:11:44 -05:00
60061fbe79 Fixed up CPU dispatch and tested. Can begin implementing kernels 2017-02-23 05:11:44 -05:00
46e7042add SIMD helper header, modified add in THTensorMath to check dispatch 2017-02-23 05:11:44 -05:00
d0c182773b First commit for dynamic CPU dispatch: general framework in place (need to create dispatch tables and stubs for all functions and make impls have hidden linkage) 2017-02-23 05:11:44 -05:00
b6f60585b5 fix AVX2 detection bugs 2017-02-23 05:00:55 -05:00
4b0e3ee219 Merge pull request #699 from twitter-forks/bitops
Bitwise operations
2017-02-23 04:15:35 -05:00
838842d4b2 fix documentation error. [issue #790](https://github.com/pytorch/pytorch/issues/790) (#831) 2017-02-23 08:59:29 +01:00
17d27d4882 Enable Reshape to handle scalars
Summary:
The context here is that we want fblearner predictor to handle float features (D4601334).
Since predictor processes a single example at a time, it makes sense to specify a single
float feature as a float scalar tensor.
But if the Caffe2 net has a SigridTransforms operator, it expects everything to have an
addition dimension so it can be called with multiple examples.

Being able to Reshape a scalar into a 1-d tensor will enable us to mix SigridTransforms
with other native Caffe2 operators.

Reviewed By: ender-wieczorek

Differential Revision: D4602675

fbshipit-source-id: 8b33876bf47bc341385fd7ac19cd1fd7f67a7ccf
2017-02-22 23:30:25 -08:00
95262032d8 ] Char RNN bug fix for batching
Summary:
It could be that only first item
in the batch was really used in a case rest of the memory was 0. Or if
memory there had a big positive integer, then whole sequence was used. So we used rest of the batch depending on our luck :)

Reviewed By: Yangqing

Differential Revision: D4599569

fbshipit-source-id: ae89cee796bbcbc232e4abcab71dee360b0d8bc6
2017-02-22 17:34:30 -08:00
312821d36c Allow in-place instance norm.
Summary:
In-place is ~30% speedup, but needs a change to torch2caffe
or a graph rewrite on the client.

Differential Revision: D4577582

fbshipit-source-id: c31bf8ba97f4fa4cedf355cf2475eb7bab48b304
2017-02-22 14:03:55 -08:00
e71cf20192 improved serialization (no tar copy) (#713) 2017-02-22 22:24:20 +01:00
c7ed091633 Added model downloader
Summary: Closes https://github.com/caffe2/caffe2/pull/156

Reviewed By: Yangqing

Differential Revision: D4574588

Pulled By: bwasti

fbshipit-source-id: a0f2da0b13358157c7d7322257a9c4f1c61aae12
2017-02-22 12:47:15 -08:00
b0148a7c7d Use ws_nbytes_limit (called cudnn_ws in args).
Summary:
cudnn_ws args was already there. This PR only uses that args when model is created.
Closes https://github.com/caffe2/caffe2/pull/164

Differential Revision: D4598443

Pulled By: Yangqing

fbshipit-source-id: c2e83f73059360ecf2fedf2c62be7cacbb4034ca
2017-02-22 12:19:16 -08:00
aed3aabc7f model and preprocessor can handle empty dense inputs
Summary: we may not need dense feature inputs in some models (e.g., double helix).

Reviewed By: dzhulgakov

Differential Revision: D4568755

fbshipit-source-id: 6850508f86fafb53f81783b2a2a38776be5455d7
2017-02-22 11:19:15 -08:00
45e1905722 add support of fp16 to SparseLengthsSum and SparseLengthsMean
Summary: Another part of making DPER compatible with half-floats. This diffs adds supoprt of fp16 to segment reduction operators used in DPER.

Reviewed By: dzhulgakov

Differential Revision: D4587560

fbshipit-source-id: 0ae10648a7286a820bffaee802464dd9464584bc
2017-02-22 11:05:55 -08:00
b2cf0fad15 Convert SparseLookup layer's embedding to fp16 blobs for predictor
Summary:
First part of adding half-floats support to DPER 2.0. Let's add an option use_half_floats to enable converting some weights of the model from fp32 to fp16 before saving it to predictor models parts. For now it's for SparseLookup layer's embeddings. All conversion is done after training is finished and saved models are ready to be used on remote predictors as-is (they will be stored compacted in memory). New fp16 blobs are saved to the model instead of original ones, under the same names, so we don't modify MetaNetDef at all.

Next steps:
1) support on delivery side -- operators working with these blobs should support both float and float16 input types
2) benchmark performance to make sure there is no regression
 a) of serialization
 b) of delivery
3) support realtime training (I'm thinking about adding new pre-publishing net which will be executed each time the realtime trainer stops to publish a new snapshot)

Depends on D4567304

Reviewed By: kennyhorror

Differential Revision: D4571710

fbshipit-source-id: 19967a17d3bd84878d66e8c0ed8c5342bf38d979
2017-02-22 11:05:49 -08:00
64419a928d Implement EnsureDenseOp and EnsureDenseGradientOp.
Summary:
This operator can always outputs dense gradients regardless of
the input gradients. For forward pass, it passes inputs to outputs in place.

Reviewed By: xianjiec

Differential Revision: D4582511

fbshipit-source-id: 7eb2c5d2142aa05d373f06cab1e7f89d8b747d34
2017-02-22 07:16:26 -08:00
47b65b6d8d Add a create your own dataset tutorial
Summary:
bwasti - will follow up via email.
Closes https://github.com/caffe2/caffe2/pull/166

Differential Revision: D4596858

Pulled By: Yangqing

fbshipit-source-id: 6d088ccf1604e0dc9b94cbf0a75b51587e734d95
2017-02-22 03:31:47 -08:00
59f0454621 Gather perf counters for distributed jobs
Summary: Set up a server node that periodically gathers values of all nodes' perf counters, allowing to publish them at once.

Reviewed By: dzhulgakov

Differential Revision: D4555116

fbshipit-source-id: 8e49ac8353b52b2be82aedf305762478e7fa687a
2017-02-21 22:06:25 -08:00
ba1d592b5f New 40% faster net-type for MLP on GPUs
Summary:
This diff introduces a new net type 'singlethread_async' which is based on my investigation of DPER/hogwild MLP bottlenecks.
It only uses one CPU thread, but multiple GPUs on each GPU. This is implemented by having each Net to submit their list of operators to
a central GPU-specific executor queue and a thread that executes them asynchronously. This executor takes all tasks in the queue and executes them on separate cuda streams and then waits them in the end. This solution can achieve >95% GPU utilization on 8 GPUs when sufficient amount of workers is used.

FYI: I also tried fancier solution such as using cudaStreamCallbacks(), but they did not have as good performance.

Improved the dper bench by adding the MomentumSGDUpdate operations and adding speed test capabilities. During my testing I also noticed that the startup costs for inizialing CUDA streams and contexts  are high, so it is important to do a warm up.

Reviewed By: Yangqing

Differential Revision: D4553941

fbshipit-source-id: bb00524bef653d75de026dd64097b8d9b7a0acb3
2017-02-21 21:40:15 -08:00
6ff05fd49d Fix issues pickling jobs
Summary:
We were running into a problem where a Job could not be pickled. It needs to be pickled in order for the master flow operator to execute it using the session.
This creates a concept of "compiled" Job, that pretty much only stores protobufs with the Jobs to be executed, avoiding any issue with pickling.

Reviewed By: dzhulgakov

Differential Revision: D4554799

fbshipit-source-id: 2ee9877ca49a796d51925e5ec917436e3d930984
2017-02-21 20:47:27 -08:00
8fa156d082 Improve "reporter net" design
Summary:
Previously we had several limitations for a reporter net:
 - needed to be a net, not an execution step
 - only one allowed per execution step, with a single interval

Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts.

Reviewed By: dzhulgakov

Differential Revision: D4583686

fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d
2017-02-21 20:17:40 -08:00
7a65736e46 Fix some python version issues with cmake
Summary:
This script will attempt to determine files that will be useful for building with the correct python version.  Currently on macOS with various python installations CMake fails to determine the correct location of python libraries.
Closes https://github.com/caffe2/caffe2/pull/163

Reviewed By: Yangqing

Differential Revision: D4594954

Pulled By: bwasti

fbshipit-source-id: c2b750ee9608a02fad4ce2f2293f5fa54dc7011c
2017-02-21 17:46:57 -08:00
26be1977bf fix CrossEntropyOp bug for batch input
Summary: this is to fix the bug with eigen implementation which calculating crossentropy

Reviewed By: salexspb

Differential Revision: D4582078

fbshipit-source-id: 4c92047e9dbbe219fcbef618a45c584c2fbfaad5
2017-02-21 17:34:31 -08:00
183e158642 Remove Model API (unused)
Summary: Removed Model API because no one {seems to,should} be using it

Reviewed By: Yangqing

Differential Revision: D4575126

fbshipit-source-id: 174d39e9aa46750f1fae8295f7e1e5452559af33
2017-02-21 17:19:05 -08:00
04eccb8ebe Performance counters
Summary:
- Key-value store for counters.
- Counters are updated via macros that also export USTD probes.
- Counter values can be exported using caffe2 operators.
- Snapshot mechanism for tracking time-window counter values.

Reviewed By: dzhulgakov, pietern

Differential Revision: D4553761

fbshipit-source-id: 25a1a91a3168dcff2159c6fba7b357d3fd3aa9bf
2017-02-21 16:31:24 -08:00
adb4cb2b5b contiguous view backward (#816) 2017-02-21 19:09:36 -05:00
478d7446ef CMake fixes
Summary: Adds script to populate third-party directory.

Differential Revision: D4591509

fbshipit-source-id: 28934feb536a9f3a066d8c40988337f3dddffaed
2017-02-21 15:06:45 -08:00
7f4d5e9900 Add feed label parser operator.
Summary: Add feed label parser operator, this layer depends on D4520993.

Reviewed By: kennyhorror

Differential Revision: D4538797

fbshipit-source-id: 8efcd7b2f6962c30023c7464a13c125ba1a99dc4
2017-02-21 14:17:00 -08:00
ea9f4da368 fix typo in TextFileReader
Summary: as title

Reviewed By: bwasti

Differential Revision: D4591870

fbshipit-source-id: 01912ee75b036335402c7b4a5b147f20a50ce95b
2017-02-21 14:02:48 -08:00
2d8784ce55 Add python-protobuf install instructions
Summary:
Fixes #158
Closes https://github.com/caffe2/caffe2/pull/159

Differential Revision: D4592503

Pulled By: Yangqing

fbshipit-source-id: 9398ee30e507fd6958818fe407786a7895792775
2017-02-21 12:03:00 -08:00
df68230351 README and docs skeleton
Summary: TSIA

Differential Revision: D4591755

fbshipit-source-id: fa435f4ad6b97453c3c9516b4bfc9f8f0fb2e4f1
2017-02-21 10:52:04 -08:00
6073f9b46c update table in README.md
it removes the empty top row
2017-02-21 12:58:04 -05:00
8e8022b735 Merge pull request #418 from ruotianluo/adaptiveAverage
Add SpatialAdaptiveAveragePooling.
2017-02-21 09:15:12 -05:00
da82d2dd70 Merge pull request #434 from bottler/master
VolumetricFractionalMaxPooling like spatial
2017-02-21 09:13:59 -05:00
82176473a5 Merge pull request #442 from twitter-forks/half-fixes
Convert real to accreal in libTHCUNN
2017-02-21 09:12:56 -05:00
2d269a9a72 Merge pull request #1137 from twitter-forks/half-fixes
Using accreal instead of real in the API
2017-02-21 09:12:32 -05:00
240372a991 Fixed topk documentation for largest=True 2017-02-21 04:38:24 -05:00
5b10411c8c Fixed some mistakes in examples
Fixed mistakes in LSTMCell and GRUCell examples.
2017-02-21 04:17:28 -05:00
4c474a9939 Improve prodall CUDA test 2017-02-20 23:28:31 -08:00
7ea6ae57c8 Support numpy arrays in default_collate 2017-02-20 23:28:31 -08:00
42633f8986 Fix misspelling and add support for weights in NLLLoss2d 2017-02-20 23:28:31 -08:00
84248690a9 Add support for indexing with None and slices with positive steps 2017-02-20 23:28:31 -08:00
53409ca0fb Fix a warning in THPP 2017-02-20 23:28:31 -08:00
c2c1710047 Add clip_grad_norm 2017-02-20 23:28:31 -08:00
876202503f Support multiple inputs in data parallel 2017-02-20 23:28:31 -08:00
946a7d9bc3 Make input contiguous only once in backward of cuDNN RNN 2017-02-20 23:28:31 -08:00
608bcd3b15 Return correct number of gradients from cuDNN RNN 2017-02-20 23:28:31 -08:00
632b02a477 Add checks for reward type and size in StochasticFunction 2017-02-20 23:28:31 -08:00
0db9c63300 Use library_dirs in setup.py 2017-02-20 23:28:31 -08:00
873ed4e6b6 Add better error message for conversion of CUDA tensors to numpy 2017-02-20 23:28:31 -08:00
01bd43037d add docs to torch/cuda/random 2017-02-20 20:43:47 -05:00
68c9e3f232 Fixed typo in GRUCell example 2017-02-21 01:37:04 +01:00
a25c8555eb Fixed paper references 2017-02-21 00:27:18 +01:00
d6ca3820aa Optionally specify stream for pointers in CUDA algorithms
Summary:
Work may be queued on CUDA streams for asynchronous execution. The
memory backed by pointers passed to any algorithm can therefore be
mutated after constructing an algorithm instance. By also passing in
the streams these mutations happen on, the algorithms can synchronize
with these mutations to ensure no invalid data is used.

By passing in these streams, any work done by these algorithms will
*also* be queued, which effectively removes a single synchronization
step from any algorithm run.

Differential Revision: D4589394

fbshipit-source-id: 0c8cd6ba9c9018f33d6f4c55a037083fc4164acb
2017-02-20 14:15:53 -08:00
dfd1dff383 Merge commit '4ca26fbc1b7be4e369f84e95df16431bb2f1dcb7' 2017-02-20 08:05:19 -08:00
8f391d4d51 Merge commit 'ee43cd7adca3b24a2071ce6c55dcd3a95a2b6ff6' 2017-02-20 07:55:46 -08:00
2a6b7685ae Merge commit 'f6c1bbfa483ad19c500dc94838baaa69f02d240b' 2017-02-20 07:55:19 -08:00
eb9573107d Merge commit '34b7fed802db1fda6322a70b648dcc4947858719' 2017-02-20 07:54:51 -08:00
ee43cd7adc Do SpatialClassNLLCriterion sizeAverage in a separate kernel 2017-02-20 06:54:23 -08:00
4ca26fbc1b Remove averaging from prodall 2017-02-20 11:37:53 +01:00
c165226325 Print a readable error message when arguments are on different GPUs 2017-02-20 11:35:50 +01:00
ea6273e048 Fix search gcc5 build
Reviewed By: philippv, luciang

Differential Revision: D4536085

fbshipit-source-id: 2eb950cee137db16ec632b669a209b0f4419a6d3
2017-02-17 20:31:26 -08:00
0722775ca3 AllreduceRingChunked/CudaAllReduceTest should use the chunked algorithm
Summary: I was mistakenly calling the non-chunked algorithm for the chunked test.

Reviewed By: pietern

Differential Revision: D4580160

fbshipit-source-id: 9d62a68e9e86cc6e596d90ff8854c585a0e8855c
2017-02-17 19:17:44 -08:00
23602488cc Fix ProtoBuf.cmake to use PROTOBUF_LIBRARY as well
Summary: Closes https://github.com/caffe2/caffe2/pull/152

Reviewed By: Yangqing

Differential Revision: D4577524

Pulled By: bwasti

fbshipit-source-id: 019ced46dc474c413ba00a98b8fdeb7230a28b55
2017-02-17 19:16:47 -08:00
49295ebe54 Add sequential to documentation 2017-02-18 08:42:43 +05:30
5bc3d2ef03 Add ReduceFront GPU Op's
Summary: Add GPU implementation for ReduceFront{Sum|Mean} Ops

Differential Revision: D4577270

fbshipit-source-id: 697f498531af6b9da4a0138d2a9beb39234f9756
2017-02-17 16:46:42 -08:00
455038e470 Use a more stable formula for spatial LogSoftMax 2017-02-17 13:05:45 -08:00
ca7f02ea0c Add shape checks for SpatialClassNLLCriterion 2017-02-17 13:01:56 -08:00
af027d5025 add assert for labels for spatial case
Differential Revision: D4570726

fbshipit-source-id: fe73c7f0dfa3b5d5ad50b2a1ed651f520e609985
2017-02-17 11:49:56 -08:00
b8f6ff1a5d Make Shape GPU supported.
Summary: Fix hard coded CPUContext and add CUDA support for shape function

Differential Revision: D4577053

fbshipit-source-id: b515e52c39c02aa1600ccb1c3e559c9a5a0b718c
2017-02-17 11:30:27 -08:00
04aba1caec Fix cuDNN dropout desc for multi-gpu (#772) 2017-02-17 19:16:12 +01:00
ba7fad53b5 Support for sample softmax
Summary:
This diff adds ability to train multiclass classifier on sampled subset of classes. This basically implements what described in https://arxiv.org/abs/1412.2007 without the sampling probability correction. Since this implement uniform sampling, sampling probabilities are cancelled out in softmax anyway.

The trick to make this work is to have 2 different nets for prediction and training, both shared parameters. The model is built normally until the last layer. If sampling is needed, then we do the following:

The class sampling works as following:

Reviewed By: xianjiec

Differential Revision: D4512859

fbshipit-source-id: ab537bcac81d5e5877a8795045e8682c8064da68
2017-02-17 09:31:54 -08:00
420488349f Implement CUDA-aware allreduce chunked
Summary:
First pass at a CUDA-aware allreduce chunked implementation. For now the algorithm runs on the CPU and is mostly copy/paste from allreduce_ring.h. A subsequent pass will offload to the GPU.

Serialize cuda test to avoid intermittent failures due to memory contention.

Reviewed By: pietern

Differential Revision: D4576959

fbshipit-source-id: e1f292a05b88ff24c33e549d4a52e770a21f85d2
2017-02-17 09:06:05 -08:00
f6c1bbfa48 Merge pull request #1105 from ruotianluo/adaptiveAvg
Add SpatialAdaptiveAveragePooling
2017-02-17 10:52:33 -05:00
4e2c8c6db5 Merge pull request #1123 from bottler/master
VolumetricFractionalMaxPooling like Spatial...
2017-02-17 10:42:21 -05:00
1a5cae7340 Add busy-poll option in TCP transport
Summary: Ideally we would want the driver to busy-poll for us. In absence of driver support, spinning with MSG_DONTWAIT flag seems to be helping a lot too. Of course, we pay the price of burning one core for polling. Sigh.

Reviewed By: pietern

Differential Revision: D4576242

fbshipit-source-id: 85d9e1b786fbb6053864fba80f3e5ecc80fe221d
2017-02-17 07:31:32 -08:00
c26b9c0a5e Update rnn.py
Based on the https://github.com/pytorch/pytorch/blob/master/torch/backends/cudnn/rnn.py#L302 line, the output is returned in a (0,1) transposed version, if the batch_first argument is set to true.
2017-02-17 14:37:14 +01:00
aaf41c61a6 Fix Engine::compute_dependencies 2017-02-17 18:28:51 +05:30
945e75bd3a Remove openmp parallel for in caffe2
Summary: Task L1 items: Replace CAFFE2_OMP_PARALLEL_FOR with TODO and remove macro definition.

Reviewed By: ajtulloch, Yangqing

Differential Revision: D4565679

fbshipit-source-id: 8185af2b77b230159058c0a756a0da25ebcf3d0f
2017-02-16 22:05:10 -08:00
dd844f741b Fix previous_functions when it contains Variables 2017-02-17 11:03:46 +05:30
4dd19988c3 Add benchmark option to display nanoseconds
Summary:
Latency optimization is going well and I've seen the odd case of <10us
measurements. This option makes the benchmark tool display nanos
instead.

Differential Revision: D4575925

fbshipit-source-id: 98dbd3b39e31cbcdd4c146613f6630e721187e1e
2017-02-16 21:16:26 -08:00
7117a9012e Fix flaky non-contig test 2017-02-17 10:40:08 +05:30
1bdc28161a Add torch.__version__ 2017-02-17 10:40:08 +05:30
5e150caf38 Fix a bug in Engine::compute_dependencies 2017-02-17 10:40:08 +05:30
c0c62d099a Make detach() actually remove the creator 2017-02-17 10:40:08 +05:30
b9ece39685 Make torch.Size methods return torch.Size, not tuple 2017-02-17 10:40:08 +05:30
8949abe10b more clear about supported output dimension
Summary: Do I understand correctly? It must be of size 1 for sigrid

Reviewed By: kennyhorror

Differential Revision: D4576541

fbshipit-source-id: 92fa8dc62e36ff095e14cceeb80b03c0028f5695
2017-02-16 21:01:52 -08:00
d8b7166251 Move build_ftrl to open source directory
Summary:
Move the open source version of build_ftrl to the open source directory.
Because build_ftrl can use several engines, the SIMD engine is fb specific.
We keep the build_ftrl in the fb/optimizers/sgd.py file.
So, if the caller only uses the open source engine, it can import the
open source build_ftrl. If the caller may use the SIMD engine, it needs
to import the fb specific build_ftrl.
Also move the tests to python directory.

Reviewed By: salexspb

Differential Revision: D4560384

fbshipit-source-id: 84fc915d3bbe42fd19503ef132d3277088f6fab3
2017-02-16 18:02:15 -08:00
15ef008877 Using accreal instead of real in the API
- This reverts commit 7a07afe545b4deae5919d9dc268bfac3d37398c7.
- Includes fixes for TemporalRowConvolution
2017-02-16 17:34:11 -08:00
b14d6318f8 Convert real to accreal in libTHCUNN
- This reverts commit 0d85922d116879448485ef88ae21e83a9255a0b0.
- Includes fixes for TemporalRowConvolution
2017-02-16 17:33:03 -08:00
d0621a2449 NextScopedBlob with well-defined behavior and respect namescope
Summary:
Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope.

The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference.

This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models.

Reviewed By: kennyhorror

Differential Revision: D4555423

fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187
2017-02-16 17:16:36 -08:00
b436788b16 LSTMUnit: pass through H values
Summary:
Pass through the h-value recurrent output unchanged at each LSTM step beyond the valid part of a sequence (computed based on seqLengths, allowing batching of sequences of different length). This enables using the final-step output of each sequence as the output when one vector is desired for the entire sequence. Gradient also passed back unchanged.

Also made some cosmetic changes to recurrent_network_test.py (seq_lengths offset corrected, should be in [1, T] rather than [0, T-1]).

Reviewed By: urikz

Differential Revision: D4540307

fbshipit-source-id: 73a9f6326069d713dcb0cdc8d17869317c6dbe96
2017-02-16 15:31:38 -08:00
93002720eb Extract CudaDevicePointer for reuse across CUDA-aware algorithms
Summary:
The CudaDevicePointer optionally takes an existing stream on
which it runs any operation associated with the pointer (for now just
memcpy's, but this likely will includes kernel execution in the
future).

Differential Revision: D4574035

fbshipit-source-id: ddd7972a3874012059f1fde1b341fd6edd69102d
2017-02-16 14:05:52 -08:00
9a33786dc0 Split out large gradient ops at a file level
Summary: We don't use these ops on mobile, so this saves ~150kb.

Reviewed By: Yangqing

Differential Revision: D4569599

fbshipit-source-id: c6f9d702773c64a395e87afa4cfb5b2992dba230
2017-02-16 12:32:51 -08:00
0c03c8fca5 Add name_overrides argument to SaveOp
Summary:
In current implementation of SaveOp we always use names for blobs from the
current workspace. But there is a use case for replacing names in saved model:
for example, to use half-floats in prediction model but keep full-floats for
training model we might want to save a blob "w_fp16" as "w".

Differential Revision: D4567304

fbshipit-source-id: 87bc84fa6a45d8bfa33edb55ac1fb1cff542dbe3
2017-02-16 12:32:51 -08:00
5429031917 Adding SoftmaxWithLoss operator to Shape Inference
Summary: This diff adds shape inference for the SoftmaxWithLoss Operator

Differential Revision: D4565835

fbshipit-source-id: 1c2db398524c765977ec4d8a22c9b986bf9faf82
2017-02-16 12:32:51 -08:00
6b0545d764 Implemented logging of inputs per second
Summary: Every time data is put into the logger, it checks if a second has passed. If so, it displays how many inputs were put in the last second.

Differential Revision: D4527148

fbshipit-source-id: f197eb975ed81111449705e0719d1e56f385fd8d
2017-02-16 12:02:05 -08:00
7c44506441 allow DataParallel to have tuple inputs on a single GPU 2017-02-16 19:07:17 +01:00
70fb22f5be update sparse_to_dense op ENFORCEs
Summary: Hope to catch that weird error

Reviewed By: rayleichen

Differential Revision: D4571699

fbshipit-source-id: 7f9654286b968df527bb3f13a088c3c0725e2412
2017-02-16 09:12:29 -08:00
937ba581d7 Improve nn.legacy compatibility with Torch7 (#738) 2017-02-16 21:17:12 +05:30
2ae54f1194 setup.cfg -> tox.ini (#761) 2017-02-16 21:13:13 +05:30
a3e24b2b5f Fix LabelCrossEntropyOp to ENFORCE >= 0 as well.
Summary: As desc.

Differential Revision: D4570755

fbshipit-source-id: 665326da7a057e357a31736a0c196dccc83f4ccc
2017-02-16 06:11:29 -08:00
d4b1d347e9 minor: make cmake cuda ready
Summary: Closes https://github.com/caffe2/caffe2/pull/153

Differential Revision: D4571506

Pulled By: Yangqing

fbshipit-source-id: 4e887071774749fb84d34cab114dad4587d36ff1
2017-02-16 06:11:29 -08:00
7ee9984556 Added local build and apple fix for generating .so files
Summary: Closes https://github.com/caffe2/caffe2/pull/147

Reviewed By: bwasti

Differential Revision: D4564024

Pulled By: JoelMarcey

fbshipit-source-id: 526a5ab700f9356a3c93a6c64dc38e44a173559c
2017-02-16 06:11:28 -08:00
7d9a0a41fd Allow forcing single-threaded execution at runtime.
Summary: Might be useful for the EXC_RESOURCE / CPU issues.

Reviewed By: salexspb

Differential Revision: D4565494

fbshipit-source-id: 74ac9edeba6334a46ee6799a93ca96eb68216439
2017-02-16 06:11:27 -08:00
40534de705 Gradient for Copy operator
Summary:
One can find a reason, why I need gradient for CopyOp in this post - https://fb.facebook.com/groups/1405155842844877/permalink/1639683782725414/

Gradient for CopyOp is trivial in case the device was the same (cpu, or same gpu), but get's a little harder, when the copy was made across two different gpu.
I introduce new operator CopyOnDeviceLike, which has additional second input. The op copies the first input to the same device as the second one. The default implementation is exactly the same as CopyOp, but I specialize it for CUDAContext.

Please, let me know if I'm doing anything wrong here! That's my first caffe2 diff, related to operators definitions.

Reviewed By: Yangqing

Differential Revision: D4557258

fbshipit-source-id: 9494be589cc1e5696bbbfe25b7622aaa4c9efe4a
2017-02-16 06:11:27 -08:00
797720225d refactor LoadImageOp and dbreader
Summary:
- updated image pre-processing to avoid detectable differences in re-sizing for different angles
 - refactored utility functions into dbreader and image_input
 - fixed an issue in image_input where crop assert was firing because it was testing pre-resized image

Reviewed By: seansnyder

Differential Revision: D4550365

fbshipit-source-id: 6461e24a26367c8f6af5e2682beb2b3acd67842b
2017-02-16 06:11:26 -08:00
d9d6f1e905 Fix missplaced semi-colon 2017-02-15 20:41:05 -08:00
cb91078e01 Support synchronous mode for TCP transport
Summary:
In synchronous mode, it is not the device thread that is responsible
for handling I/O, but the user thread itself. Calling waitRecv on a
buffer will trigger the read function on the pair to be called. This
eliminates the context switch necessary if the device thread is
handling all I/O. For benchmarks with small numbers of elements this
reduces latency by as much as 20%.

Reviewed By: plapukhov

Differential Revision: D4549998

fbshipit-source-id: ab718ba090c06d7c7aa4065cc9f92bd96b9e4a35
2017-02-15 17:31:06 -08:00
5db768a60f Improve InstanceNorm NCHW performance (~30% on iOS style transfer)
Summary:
Refactors some of the vectorization and accumulation.

Parallelization is a TODO, I'm not sure how Android goes and it's just an
incremental ~10% or so.

Reviewed By: Yangqing

Differential Revision: D4568850

fbshipit-source-id: aa9db5a364bb738f492085772dc82b94885eb4d6
2017-02-15 17:16:14 -08:00
56d3748b26 Merge branch 'master' of http://github.com/caffe2/caffe2 2017-02-15 17:08:04 -08:00
93841acc1a added docs for learning_rate_op and tweaked docs formatting 2017-02-15 17:07:54 -08:00
3098bef94e Getting things in sync with the internal repo
This file was moved.
2017-02-15 16:03:01 -08:00
c7c4b00a50 windows build: getting there
Summary:
This clears up a bunch of windows build errors, but there are still 12 errors mostly relating to
- template keywords
- initializer list
- pthreadpool

that are not readily available on windows. Also, cuda build is being disabled right now.

Current error can be found here: https://ci.appveyor.com/project/Yangqing/caffe2-w2ucm
Closes https://github.com/caffe2/caffe2/pull/151

Reviewed By: bwasti

Differential Revision: D4564591

Pulled By: Yangqing

fbshipit-source-id: adacad5fa2d6d52d586700947972e3674e3b6e60
2017-02-15 16:00:45 -08:00
81d932b161 Add LeakyReluOp to caffe
Summary: Adds LeakyRelu to caffe2 with a test.

Reviewed By: bwasti

Differential Revision: D4511970

fbshipit-source-id: a7189c691ec1813b304bf04f2b73f1c61acd08e2
2017-02-15 16:00:45 -08:00
50a6897e80 Shape inference for ImageInput, NHWC2NCHW and StopGradient
Summary: As in headline. I had missed these originally.

Reviewed By: kennyhorror

Differential Revision: D4560255

fbshipit-source-id: e69458e8a2574b981e40e915d87c8e16dadee7d6
2017-02-15 16:00:45 -08:00
63901e9aca allow recurrent network gradient op to receive gradient on any combination of network output blobs
Summary:
(Caffe2) Modified RecurrentNetworkGradient operator so that training is possible with any of the output blob(s) receiving gradient during the backward pass. This is realized through a new argument for the RecurrentNetwork op, outputs_with_grads, which takes a list of the indices of the output blobs which will receive gradient. The default case (only receiving gradient from the first output blob) remains the default.

New unit test covers the case where outputs_with_grads = [1, 2] using Python LSTM wrapper.

Reviewed By: urikz

Differential Revision: D4518516

fbshipit-source-id: 5c531582b20f3cf727d1aa91239b4d5a2b8a7c1f
2017-02-15 16:00:45 -08:00
cb3c41b9a9 PiecewiseLinearTransformOp transform binary predictions specially
Summary:
The existing op tranforms the input in a general way. It needs M transform mappings to transform a NxM input tensor.
But for binary predictions X (Nx2 tensor), we know that X[:, 0] = 1 - X[:, 1].
So we just need one mapping for X[:, 1]. After being transformed, we can compute X[:, 0].
This diff is to handle this.

Differential Revision: D4550441

fbshipit-source-id: 42d8c6e88d830c97628ee930b543740a32acf904
2017-02-15 16:00:44 -08:00
718786add7 UniqueUniformFillOp
Summary: This is like `UniformIntFill` but guarantee to return unique elements in the output, excluding the optional avoiding elements.

Reviewed By: xianjiec

Differential Revision: D4511814

fbshipit-source-id: 5dc98ee580616e60e46ee74ebb3f5ddd29a09965
2017-02-15 16:00:44 -08:00
93795406c5 Adapt NLU proj code for Caffe2 RecurrentNetworkOp changes
Summary: Updates function revise_recurrent_network_op() which supports cloning recurrent networks by adding a blob-name prefix to string arguments to maintain correspondence. Previously relied on many hard-coded indices referring to the positions of arguments and inputs of RecurrentNetworkOp and its corresponding gradient operator, and therefore broke when the implementation changed. This fix should make it more general and robust

Differential Revision: D4559768

fbshipit-source-id: fb85b0b1ffb1393dc84760d6ae5dc473e8b764b0
2017-02-15 16:00:44 -08:00
fc0be229b6 add mutex locks for pinnedcpuallocator to avoid nccl-deadlocks
Summary: D4438796 (https://github.com/caffe2/caffe2/pull/95) introduced locks to avoid concurrent cudaFrees and NCCLs. Unfortunately, the locks were not put into PinnedCPUAlllocator, causing deadlocks in certain cases (like using Hive reader).

Reviewed By: Yangqing

Differential Revision: D4563752

fbshipit-source-id: 0f95051621282e742f03feb76ebc30662285fb8e
2017-02-15 16:00:44 -08:00
a8d70f3552 Try to improve serialization speed for SparseNN.
Summary:
Created some simple benchmark to test model saving speed, plus few possible
optimization on top of it.

Since we don't really want to have partial LogFileDB ever, it makes sense to
commit the transactions only after we've finished serialization.

As a result in my test serialization time in my dummy test drops from
480 seconds, to:
Serialization time: 52.5134651661
Deserialization time: 60.5741639137

One more really scary things that I've found:
it looks like load_op with load_all might actually load corrupted DBs (if they'll be truncated), so we do need to fix it really badly (save all blobs we have in the DB or even better checksum).

Reviewed By: dzhulgakov

Differential Revision: D4558216

fbshipit-source-id: 4145c07f29b9dda527a2e57842f3abd8023d71a3
2017-02-15 16:00:44 -08:00
fb7c9108d9 get parameter blobs of a model
Summary: to verify that a model only used a subset of the parameters of another model (e.g., the model doing training).

Differential Revision: D4557787

fbshipit-source-id: bd8ac96f5e78e05f6f56086db6e6ddcda36c1d37
2017-02-15 16:00:44 -08:00
31ca9d57b6 Remove args in Grad
Summary: Removed Def().arg() in the backward computation since they have already been included in the forward.

Differential Revision: D4563600

fbshipit-source-id: bb6ee25e7c8da99977b82963670267392893fcde
2017-02-15 16:00:44 -08:00
6fabf8ed1a Documenation generation to wiki
Summary: generates a fair amount of documentation from the operators.  also provides a framework for later documentation generation and custom syntax.

Reviewed By: dzhulgakov

Differential Revision: D4168311

fbshipit-source-id: 89ae9d023ad883623cdc1879c11e10b202b68793
2017-02-15 16:00:44 -08:00
571539aa5d implement CNN optical flow calculator
Summary: addition of cnn optical flow calculator in another diff

Reviewed By: Yangqing

Differential Revision: D4549616

fbshipit-source-id: c444b8e7fb74d348436bc50a4432698c12ba0aec
2017-02-15 16:00:43 -08:00
a217fefee1 Update rnn.py
Fixed a problem with outputting the RuntimeError if arguments are incorrect in cudnn/rnn.py
2017-02-15 21:49:42 +01:00
34b7fed802 Fix gcc 4.4.7 build. 2017-02-15 09:06:25 -08:00
5221745c21 add test for bias=False for 3d convolution 2017-02-15 04:26:44 -08:00
000ca44b16 Merge commit '797544c47a4e9bdff02137a127f883a6df9b3dfe' 2017-02-15 04:24:14 -08:00
8f3d44033b Merge commit '0426f2f3ec2b932cb83d64101081244c2a1451b1' 2017-02-15 04:23:50 -08:00
7cc14c595a Merge commit '07f5b21ef1bd29d1451c616062dcbfc3f8fd7c6a' 2017-02-15 04:23:18 -08:00
797544c47a implementation of bias=False for VolConv.cu 2017-02-15 04:18:17 -08:00
0426f2f3ec implementation of bias=False for VolConv.c
Used .c file changes from 7318e2de13 as a starting point. All changes to .c files (except for whitespace details) are present here.
However, the required .h files were not present in that PR.
2017-02-15 04:16:09 -08:00
336eeee895 kernel_size as the default stride for avg_pool1d (#744)
Following the documentation, let stride to be kernel_size if stride is not provided.
2017-02-15 13:12:18 +05:30
593f867e3e Fixed a simple compiling erroin mac OS #745. (#746)
Signed-off-by: Zhou Chang <achang.zhou@gmail.com>
2017-02-15 12:19:03 +05:30
385913be1c Fix class torch.nn.ConvTransposeNd documentation (#739)
There is no `dilation`
`output_padding` doc was missing
2017-02-15 10:37:20 +05:30
6aaa14f5fe Fix LSTMCell Doc Typo (#743) 2017-02-15 08:29:17 +05:30
07f5b21ef1 Merge pull request #702 from gchanan/conservativeAllocator
Improve THCCachingHostAllocator performance by making it reclaim less aggressively
2017-02-15 08:26:48 +05:30
ee52f89772 Implement CUDA BroadcastOneToAll algorithm
Summary:
Implement CUDA BroadcastOneToAll algorithm for GPU addresses. Refactor cuda.h into cuda_private.h to allow inclusion of <cuda.h> in public headers without polluting the namespace.

Port broadcast tests to GPU variants.

* this revision is based on Peter's revision D4546932

Differential Revision: D4547382

fbshipit-source-id: 3d294ad8862b04fb783ba22e5c925b8d7cbc8a8d
2017-02-14 18:46:56 -08:00
e454870396 Free set of stored streams and handle NULL streams. 2017-02-14 15:41:47 -08:00
2de4b8840d Added MatMul operator inference
Summary: MatMul operator now performs inference

Differential Revision: D4515770

fbshipit-source-id: 237b527cce306b4858452d430c8ecc8a79537aff
2017-02-14 15:32:14 -08:00
8d72c6016a Move tests of build_sgd, build_adagrad, and build_adam to pyton directory
Summary:
build_sgd, build_adagrad, and build_adam are in open source python directory
now.
Move the tests to the same directory.
Extract TestBase to test_util.py so that TestFtrl can still refer it.
Depends on D4552227

Reviewed By: salexspb

Differential Revision: D4554549

fbshipit-source-id: 35aed05b82c78530808ef623a25bb7532b2abbae
2017-02-14 15:32:14 -08:00
74f1796a34 Split out elementwise_mul_op.cc
Summary: tsia

Reviewed By: Yangqing

Differential Revision: D4555255

fbshipit-source-id: 0e6e4549b73b53e425dca4d60c05f59d6c09222b
2017-02-14 14:17:12 -08:00
a62866cc94 Compilation fix in FC op shape inferencen
Summary: There's a bug here as well (should be X[:axis] + N instead of [M, N], but that can wait.

Differential Revision: D4555244

fbshipit-source-id: cf07ffe925bd592b4e2159750b6ebd859cfe0e5e
2017-02-14 14:17:12 -08:00
9871ed4258 Migrate build_adam to python directory
Summary:
The change migrates build_adam function to the open source python directory.
Depends on D4551871

Reviewed By: salexspb

Differential Revision: D4552227

fbshipit-source-id: 2b6bef183ecfd645d0f26215a784846d8841b845
2017-02-14 14:17:12 -08:00
5fb5fd9de9 NetBuilder: Allow to call hasattr(x, ops) out of context
Summary:
hasattr(x, ops) should always work, regardless whether you're inside or outside a NetBuilder context.
There's no ideal solution here. I think this is sensible enough.

Reviewed By: kennyhorror

Differential Revision: D4557228

fbshipit-source-id: 4b1c1db5c8b11e4ccbf977b3f82c63b2c3e6e7db
2017-02-14 13:47:10 -08:00
2822013437 Fix flaky tests 2017-02-14 21:28:50 +01:00
72c1982734 Add some more asserts to cuDNN RNN 2017-02-14 21:28:50 +01:00
0de2ea305a Support retain_variables in cuDNN RNN 2017-02-14 21:28:50 +01:00
d899385a3d Raise error when too small input is given to conv 2017-02-14 21:28:50 +01:00
c6d6cbe8a6 Check that all tensors are on the same GPU in cuDNN bindings 2017-02-14 21:28:50 +01:00
85e82e85d8 Fix bug in zero_grad, when some parameters didn't require grad 2017-02-14 21:28:50 +01:00
a1534cc37d Fix auto-gpu in cat 2017-02-14 21:28:50 +01:00
8c8dc791ef Load half and double THCUNN backends 2017-02-14 21:28:50 +01:00
63edca44f2 Add tests for non-contiguous inputs and gradients 2017-02-14 21:28:50 +01:00
b9f4977be9 Fix git URL in README
Summary:
URL was changed in f516943841
Closes https://github.com/caffe2/caffe2/pull/127

Differential Revision: D4559705

Pulled By: Yangqing

fbshipit-source-id: 72b653924d85763ac3e26081f275d699f16b494f
2017-02-14 11:48:09 -08:00
524bc07973 Change the schema of IndexLoad & IndexFreeze so that state change is captured by the framework
Summary: These operators update the state of the instance and therefor should have the instance in the output list.

Reviewed By: xianjiec

Differential Revision: D4554773

fbshipit-source-id: 556d484fcf58878308aa6b0f7cd7ea2446d3f29e
2017-02-14 10:05:12 -08:00
7aef4b2662 Migrate build_adagrad to python directory
Summary:
The change migrates build_adagrad function to the open source python directory.
Depends on D4547016.

Reviewed By: salexspb

Differential Revision: D4551871

fbshipit-source-id: cb68d9b2a723b0f069c8a24cfa3062f1e676c016
2017-02-14 07:16:46 -08:00
7bdd8737cb Fix to dagnet execution & dependency pruning
Summary:
Matt uyt reported (1) a very infrequent assertion failure at net.cc worker function. This was caused because an operator, that was not a chain, was scheduled in the job queue. This was possibly to happen since our DAGnet operator graph is graph of operators, and not chains. The dependency prunign that I introduced last week exposed this problem since it removed some "middle-to-chain" dependencies when computing the chains. (It is bit hard to explain).

This diff attempts to fix the problem by only allowing scheduling of chains. I added, in addition, extra check to confirm that all parents of all nodes were indeed executed before starting next roud. This adds additional safety and breakpoint to see if there are still problems.

I also fixed a bug in the operator graph pruning that made pruning less effective.

(1) Matt's report:
https://www.prod.facebook.com/groups/1405155842844877/permalink/1639428779417581/

Reviewed By: dzhulgakov

Differential Revision: D4531424

fbshipit-source-id: 80fa7def6e8aff6910ebf0d9d5fef15ff20e0aec
2017-02-13 23:47:45 -08:00
e52676b272 Delete SerializeToString() call in class Model(), workspace.py
Summary:
.In Tutorial, I found it not correct when calling Model(). After that changing, It works.
Closes https://github.com/caffe2/caffe2/pull/148

Reviewed By: bwasti

Differential Revision: D4556894

Pulled By: Yangqing

fbshipit-source-id: 949a8d0496861f19869436908ffe1ef1a0f853b1
2017-02-13 23:18:03 -08:00
e865c940a5 initial version of windows build
Summary:
This is essentially https://github.com/caffe2/caffe2/pull/146/ but shipit
failed to trigger task determinator.

Reviewed By: bwasti

Differential Revision: D4557698

fbshipit-source-id: b0e6777957e76df4e23671371098c2c6fe83b55c
2017-02-13 23:02:40 -08:00
8f1f7e0dc2 Mini-optimization to AccuracyOp
Summary: For k-top accuracy, if the correct prediction does not make into the k-sized priority queue, it is not going to be in the top-K, so we can short circuit.

Reviewed By: Yangqing

Differential Revision: D4555637

fbshipit-source-id: 7f07787f853f1c6b4024e279dcc6920d28bdde3d
2017-02-13 22:33:35 -08:00
6aa8c932fc Benchmark for CUDA-aware algorithms
Summary:
Separate benchmark build target for CUDA-aware algorithms.

This is needed to keep CUDA an optional dependency.

Differential Revision: D4546932

fbshipit-source-id: b73176ae9067233f883d51ba3ab4efbb13a6f86f
2017-02-13 21:32:58 -08:00
8821f4aba6 Fix race in benchmark tool
Summary: TSIA

Reviewed By: plapukhov

Differential Revision: D4549105

fbshipit-source-id: 61c8966e429e0701677f441aeaaf27fdc5e669e7
2017-02-13 21:32:58 -08:00
5e06634f7e Implement initial CUDA-aware allreduce
Summary:
This CUDA-aware ring allreduce is based on the regular ring allreduce.
It runs the reduction algorithm on the CPU and is therefore most
suited for smaller buffers.

Both the device-to-host memcpy's at the start of the algorithm and the
host-to-device memcpy's at the end of the algorithm are kicked off
asynchronously in an attempt to parallize as much as possible.

Reviewed By: Yangqing

Differential Revision: D4542816

fbshipit-source-id: 101dfad276ca79703e37ff93fb1b6d467295f66b
2017-02-13 21:32:58 -08:00
b82c4b3d38 Split benchmark code into multiple files
Summary:
The CUDA benchmark suite will be a separate build target, so the
runner should be reused.

Reviewed By: Yangqing

Differential Revision: D4545092

fbshipit-source-id: 6ccf2d30f5d35c74fc59851b25416bfe6863d62c
2017-02-13 21:32:58 -08:00
b7783a1976 Make ContextManager thread-safe
Summary: ContextManager was thread local. This caused issues because the context registration needs to be global. What needs to be thread local is the current context.

Reviewed By: jhcross

Differential Revision: D4556050

fbshipit-source-id: 5de1c0d9fd0a778c4cb1eadef01f9a1ab488f603
2017-02-13 19:45:35 -08:00
9cf830fca5 Don't install the full CUDA toolkit
Summary:
Should speed up the build process slightly since far few packages will be installed.

Matches the Caffe1 builds: https://github.com/BVLC/caffe/blob/rc4/scripts/travis/install-deps.sh#L79-L109
Closes https://github.com/caffe2/caffe2/pull/141

Reviewed By: bwasti

Differential Revision: D4551924

Pulled By: Yangqing

fbshipit-source-id: 10d3d2fe5b8f6c0ad75afa59cc9bc5d5f1c8273d
2017-02-13 18:50:20 -08:00
8d90ab2d9b compile with cudart (#737) 2017-02-14 06:40:35 +05:30
e75a8d24bf Fix compiler comlaint
Summary: gcc didn't like not returning a value

Reviewed By: Yangqing

Differential Revision: D4553052

fbshipit-source-id: 68ec2df35cf097be2d9338fcd8901a5fac6292c3
2017-02-13 17:01:53 -08:00
bd5303010d Refactor autograd package to separate Python dependencies. (#662)
The core autograd Variable, Function, and Engine no longer depend on the
Python API. This let's us implement functions in C++. In the future, we
can also multithread engine and release the GIL for most of the
non-Python backwards.
2017-02-13 16:00:16 -08:00
16d2c3d7b3 make networks converted with loadcaffe loadable 2017-02-13 23:53:46 +01:00
fc2b6e8ed6 Migrate build_sgd to python directory
Summary:
Currently build_sgd is in facebook specific directory. Need to move it to python so that
the open source world can use it.

Reviewed By: salexspb

Differential Revision: D4547016

fbshipit-source-id: d699b7b1ab8051afdeadedb4d247ec2a04a7a3e7
2017-02-13 13:31:37 -08:00
2b4ec53fcb translator fix to solve Aaron's issue
Summary: TSIA. This is actually https://github.com/caffe2/caffe2/pull/135

Reviewed By: bwasti

Differential Revision: D4552417

fbshipit-source-id: 184c085af91b87f88203c565167f66c66f17c05f
2017-02-13 11:19:13 -08:00
60be25f4cd Added shape inference to padding operator for tensors
Summary: Can now infer the shape of the tensor

Differential Revision: D4529339

fbshipit-source-id: 33553611fd3ecd7fde4b7b432c7720255ddda8be
2017-02-13 11:04:13 -08:00
54fc123610 Halfway into windows port
Summary: There are still a lot to clean up, but this is a start change.

Reviewed By: bwasti

Differential Revision: D4543980

fbshipit-source-id: 757fc49db230b56996f02d5de9b69030ebbf3b77
2017-02-13 09:46:18 -08:00
407a92dc26 std::min() requires same type (#732)
* std::min() requires same type

* cast buffer instead

* declare buffer_size as int64_t
2017-02-13 18:06:05 +01:00
0db5817290 Break the DagNet* code into net_dag.cc
Summary: Unneeded for mobile, should go from 90kb to ~30kb or so.

Differential Revision: D4545466

fbshipit-source-id: 47945493895a8f72d17de684b0429c2c7b5564ed
2017-02-13 07:32:11 -08:00
f09bd84137 GivenTensorFill breakup
Summary:
We don't need all the ~dozen filler ops - should reduce from
~60kb to 20kb.

Reviewed By: Yangqing

Differential Revision: D4545452

fbshipit-source-id: 7ed1a6ba5a2c180f37c3163bfb40844160882749
2017-02-13 07:32:08 -08:00
7134f0183e Elementwise ops trim
Summary:
We only need Add right now, so split things up.

Can take it from ~260kb to ~20kb.

Reviewed By: salexspb

Differential Revision: D4545441

fbshipit-source-id: 96e58fb4d8b2a4f120ae7d34e86cefca146ec14e
2017-02-13 07:32:03 -08:00
0a893abc7b fix serialization bug for large files 2017-02-12 19:13:02 +01:00
34fa5e0dc7 Update docstrings for testing object type
Add docstring for `is_storage()` and `is_tensor()`
2017-02-12 09:21:01 +05:30
712686ce91 Add cat, contiguous, squeeze, and unsqueeze to THPP
Use unsqueeze and view from TH/THC
2017-02-11 17:49:31 +01:00
7ca1c0e405 Add two data_loaders and refactor code
Summary:
(1) Add two dataloaders, everstore and squashfs
(2) Refactor code

Differential Revision: D4500365

fbshipit-source-id: f70fb40ca29cdbfb46da5f3f6322f2d953c01903
2017-02-11 02:13:36 -08:00
518864a7e0 Fix bug in legacy NN updateGradParameters (#714) 2017-02-11 11:04:18 +05:30
d918d77747 caffe2/caffe2/contrib/torch/torch_op.h: avoid shadowing warnings
Summary:
Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local
I plan to enable this for all of fbcode, soon.
See t13698406 for justification.

Rename inner "err" to "err2".

This avoids the following errors:

  caffe2/caffe2/contrib/torch/torch_op.h:263:47: error: declaration of 'err' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/contrib/torch/torch_op.h:263:11: error: declaration of 'err' shadows a previous local [-Werror=shadow-compatible-local]

Reviewed By: Yangqing

Differential Revision: D4544812

fbshipit-source-id: b15467ba9af7ec7f391db59f706b0442cdb664c4
2017-02-10 19:34:43 -08:00
3d0b717abc caffe2/caffe2/operators/text_file_reader_utils_test.cc: avoid shadowing warnings
Summary:
Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local
I plan to enable this for all of fbcode, soon.
See t13698406 for justification.

Rename inner "i" to "j", twice.

This avoids the following errors:

  caffe2/caffe2/operators/text_file_reader_utils_test.cc:56:14: error: declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/operators/text_file_reader_utils_test.cc:47:14: error: declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/operators/text_file_reader_utils_test.cc:41:12: error: shadowed declaration is here [-Werror=shadow-compatible-local]

Reviewed By: Yangqing

Differential Revision: D4544810

fbshipit-source-id: 089d73466f48a7a28b2a516117a12389c3ad54d2
2017-02-10 16:45:49 -08:00
14a9ce432d caffe2/caffe2/binaries/core_overhead_benchmark.cc: avoid shadowing warnings
Summary:
Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local
I plan to enable this for all of fbcode, soon.
See t13698406 for justification.

Remove declaration of unused outer "stream".

This avoids the following errors:

  caffe2/caffe2/binaries/core_overhead_benchmark.cc:28:27: error: declaration of 'stream' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/binaries/core_overhead_benchmark.cc:26:25: error: shadowed declaration is here [-Werror=shadow-compatible-local]

Reviewed By: Yangqing

Differential Revision: D4544811

fbshipit-source-id: c94e8a6e6d59705c86bc654f05d4de1ae4213eac
2017-02-10 14:35:51 -08:00
c0dd3b9744 caffe2/caffe2/mpi/mpi_test.cc: avoid shadowing warnings
Summary:
Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local
I plan to enable this for all of fbcode, soon.
See t13698406 for justification.

Rename outer "rank,size" to "rank0,size0" (to avoid shadowing another "rank" and "size" just below).

This avoids the following errors:

  caffe2/caffe2/mpi/mpi_test.cc:124:9: error: declaration of 'rank' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/mpi/mpi_test.cc:112:7: error: shadowed declaration is here [-Werror=shadow-compatible-local]
  caffe2/caffe2/mpi/mpi_test.cc:126:9: error: declaration of 'size' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/mpi/mpi_test.cc:115:7: error: shadowed declaration is here [-Werror=shadow-compatible-local]

Reviewed By: Yangqing

Differential Revision: D4544808

fbshipit-source-id: fdc53ab8763eb342302b94d82d1ac046f2af7d33
2017-02-10 14:35:51 -08:00
b0ff960301 caffe2/caffe2/mpi/mpi_gpu_test.cc: avoid shadowing warnings
Summary:
Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local
I plan to enable this for all of fbcode, soon.
See t13698406 for justification.

Rename outer "rank" to "rank0" (to avoid shadowing another "rank" just below).
Also rename outer "size" to "size0" for the same reason.

This avoids the following errors:

  caffe2/caffe2/mpi/mpi_gpu_test.cc:132:9: error: declaration of 'rank' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/mpi/mpi_gpu_test.cc:120:7: error: shadowed declaration is here [-Werror=shadow-compatible-local]
  caffe2/caffe2/mpi/mpi_gpu_test.cc:134:9: error: declaration of 'size' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/mpi/mpi_gpu_test.cc:123:7: error: shadowed declaration is here [-Werror=shadow-compatible-local]

Reviewed By: Yangqing

Differential Revision: D4544806

fbshipit-source-id: 4cfa412dd672919174d487e60aa503a32125da03
2017-02-10 14:19:19 -08:00
7721cba906 caffe2/caffe2/mpi/mpi_common.cc: avoid shadowing warnings
Summary:
Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local
I plan to enable this for all of fbcode, soon.
See t13698406 for justification.

Rename inner "new_insta_comm" to "comm".

This avoids the following errors:

  caffe2/caffe2/mpi/mpi_common.cc:167:16: error: declaration of 'new_intra_comm' shadows a previous local [-Werror=shadow-compatible-local]
  caffe2/caffe2/mpi/mpi_common.cc:162:14: error: shadowed declaration is here [-Werror=shadow-compatible-local]

Reviewed By: pietern

Differential Revision: D4544805

fbshipit-source-id: c703c3f35c71f08b4daae8491ea2518572fc8013
2017-02-10 13:01:11 -08:00
2727317384 char-rnn: add comments
Summary: Just some comments

Reviewed By: pietern

Differential Revision: D4544518

fbshipit-source-id: b517023bf5e9712a2bf96ae15a709553e5ee6032
2017-02-10 12:20:58 -08:00
72fd605b01 Fix std::accumulate
Summary:
Testing pull request again.
Closes https://github.com/facebookincubator/gloo/pull/2

Reviewed By: pietern

Differential Revision: D4542327

Pulled By: Yangqing

fbshipit-source-id: 5bd66c32c7249f1327225117815bef64b8708722
2017-02-10 10:12:37 -08:00
98f66fd282 Char-rnn : fix batching
Summary:
Input have to be arranged in such a way so j-th example of
batch i goes right before j-th example in batch i+1 in the text.

Reviewed By: urikz

Differential Revision: D4519553

fbshipit-source-id: 9dd80658e0c4d9ff0f97a7904cbb164f267fe39f
2017-02-10 10:07:32 -08:00
5c007be804 add soft label functionality to softmax with loss op
Differential Revision: D4527240

fbshipit-source-id: 548bf943857adb8f198348cc5b17ec52dc65bd2e
2017-02-10 09:01:53 -08:00
e676f4411b GPU support for RecurrentOp + Char RNN example
Summary: On batch size of 32 and other default parameters I get 70 iterations per second vs. 40 on CPU. batching still doesn't produce good loss, I am going to work on this in a separate diff

Reviewed By: urikz

Differential Revision: D4516566

fbshipit-source-id: d0611534747beb2cd935a8607a283369378e4a6c
2017-02-09 22:54:53 -08:00
335b73221c Unify train_local and train_with_distributed_readers
Summary:
Outline of changes:

- add single-operator support to Caffe2-Flow integration (based on Alisson's suggestions)
- because of above support we can move graph construction to the main workflow body and pass the job to the Flow operator doing running, similarly to the distributed case
- after that it's easy to unify code even more
- there's some trickery required to make sure model exporting doesn't pollute Cluster info (as TaskGroup.to_task() creates new tasks)

Important: this diff changes train_local behavior by introducing queue between preprocessing and trainer (before we did everything on trainer thread). It doesn't seem to impact perf much (even slightly positive), so I guess it's fine. It also allows for better unification.

I'll follow up with a separate diff that moves max_examples gating to multi_reader (including train_local) and then we can enable checkpointing.

Reviewed By: xianjiec

Differential Revision: D4526079

fbshipit-source-id: 8c44044f45e7738e9b13e5b3acfbb994bc5a3d72
2017-02-09 20:46:35 -08:00
750fb5cc73 Fixes to support short and char tensors for bitwise operations 2017-02-09 18:52:59 -08:00
0f4749907a Adding bitwise operations
- lshift, rshift, bitand, bitor, bitxor
2017-02-09 18:11:58 -08:00
bd2dc63ef6 Adding bitand, bitor and bitxor 2017-02-09 17:06:04 -08:00
3e08beb75e implement Float16EncodeOp and Float16DecodeOp
Summary: casting between fp16 and fp32

Reviewed By: dzhulgakov

Differential Revision: D4526415

fbshipit-source-id: ebffb00ae12c6bcba79096b13e84ce55ef3f02bb
2017-02-09 17:03:43 -08:00
039ac56a68 Better names for nets, steps and tasks
Summary:
- NetBuilder now honors its name
- When Nets are created in the context of a NetBuilder, they take NetBuilder's name as prefix
- When a NetBuilder is created in the context of a Task, it takes the Tasks's name.
- pipe() now tries to find a good name based on its processor's, output or input queue's name.
- RPC tries to find a name from its handler's name.
- Better names in DataStream
- net_printer prints the name of Tasks and Steps
- net_printer optionally factors out common prefixes form blob names.

Differential Revision: D4527578

fbshipit-source-id: 5d3d1237c186e9576313c5aa01cc8800a9051217
2017-02-09 16:33:54 -08:00
19a8795450 Changes to shift operations
- renaming lsh -> lshift, rsh -> rshift
- adding componentwise functions
2017-02-09 15:41:07 -08:00
d9dccfdd71 Fix for non-contiguous grad_output in cuDNN conv 2017-02-10 00:25:59 +01:00
b993a2abe4 Dump data for DocNN visualization
Summary:
- Dump instance activations, some statistics about each neuron for model introspection visualization in flow
- It is a part of minsuk's summer intern project. See the following link for high-level details: https://www.dropbox.com/s/m89rwpoomqkc9jb/aml-talk-nnvis-minsuk.pptx?dl=0
- Will combine the following two visualizations: https://our.intern.facebook.com/intern/fblearner/c2graphvis/13795371/ and https://our.intern.facebook.com/intern/fblearner/model-introspection-nn/11910201/

Differential Revision: D4303679

fbshipit-source-id: eeac699891b17cea0b29324d584937460a8d7a25
2017-02-09 13:47:07 -08:00
ed0024a82c SparseToDenseOp and GatherDense
Summary:
1. The existing Gather op outputs gradients in sparse format. We add GatherDense that does the same thing
   as Gather but outputs gradients in dense format. This relies on the SparseToDenseOp.
2. SparseToDenseOp converts sparse representation (indices, values) into a dense format (missing values are
   filled with zeros). There is an existing SparseToDenseMaskOp. It is mainly for converting sparse features
   into dense format. Modifying it to achieve our purpose is too complicated and messy. Better to create a new one.

Reviewed By: dzhulgakov

Differential Revision: D4508879

fbshipit-source-id: f4a50efa1c08586d94040f93195661c41cd414da
2017-02-09 13:33:06 -08:00
7547a06c4f Avoiding duplicated unsigned as it causes error on gcc. 2017-02-09 13:29:05 -08:00
8929b75795 Added shift operations. 2017-02-09 13:28:36 -08:00
4d37ef878c Remove view on data and target tensors of dim 1 in TensorDataset (#609) 2017-02-09 22:06:39 +01:00
efd8998690 Import gloo
Summary:
In the GitHub repository this directory will be mirrored similar to
folly, such that the repository has a single top level directory
called "gloo". This allows for versioning or renaming of the
project root, without having to mangle the include paths; they will
always use the "gloo" prefix.

fbshipit-source-id: 24502e4185fc7cbe19b5249f83609e2b8118e9d7
2017-02-09 12:33:54 -08:00
126e77d5c6 Merge commit 'e9b05c71b4acf210fad719f4da8bb58a425dd00b' 2017-02-09 12:31:58 -08:00
53eec78bea Merge commit 'ac9312e9f8002227b267a82e224a5a99c7a7e734' 2017-02-09 12:31:40 -08:00
a4edaec81a Merge commit 'aeb7a72620be47c0e6a8928a9cb6df49c06902a0' 2017-02-09 12:31:16 -08:00
92481b59d3 Merge commit '73d232ee454ca25de5552d347a2b06820f30d193' 2017-02-09 12:30:39 -08:00
f2b3f0ab5c remove decode()
Summary: This should not be needed any more since we use pybind. It will help python3 migration.

Reviewed By: salexspb

Differential Revision: D4535490

fbshipit-source-id: a47615f73b5c35b940d21bb2d5d55060fa0850be
2017-02-09 10:08:13 -08:00
8ca1b3baea import_array python3 compatibility
Summary: TSIA

Reviewed By: salexspb

Differential Revision: D4535571

fbshipit-source-id: 61ce724d4fc3c79fac551e8622a2d45cda67f80a
2017-02-09 10:08:13 -08:00
6c77fa9121 Changes in RNNBase and Embedding for compatibility with DataParallel (#660) 2017-02-09 22:36:26 +05:30
aeb7a72620 Merge pull request #693 from colesbury/view
Add code for 'view' to THC
2017-02-09 12:09:28 +05:30
73d232ee45 Merge pull request #926 from colesbury/view
Add code for 'view' to TH
2017-02-09 12:08:57 +05:30
c0c65bf915 Merge pull request #696 from colesbury/unsqueeze
Add unsqueeze to THC
2017-02-09 11:08:20 +05:30
f6cee952af Merge pull request #929 from colesbury/unsqueeze
Add unsqueeze1d to TH
2017-02-09 11:07:47 +05:30
53817feb3a Optimize computation for top-K accuracy using heaps
Summary: Per the task request, replace the original partial_sort solution by using heap.

Differential Revision: D4529118

fbshipit-source-id: 3dc01fc3a552ad020a0370f8d26cbc8be58bca6b
2017-02-08 20:46:37 -08:00
306fde233a Accept optional blob map for InferShapesAndTypes
Summary:
Shape inference allows Caffe2 to compute shapes of blobs without running a model. Update InferShapesAndTypes() to accept an optional blob:dimensions map so that external input blobs do not need to be part of the workspace.

InferShapesAndTypes() in workspace.py conditionally calls the ...from_workspace or ...from_map bindings. Note I favored a small amount of code duplication here for the sake of readability. InferShapesAndTypes() in operator.cc has been refactored into mirrored entry points, invoking a common helper.

Other minor changes to address linter warnings.

Reviewed By: dzhulgakov

Differential Revision: D4524873

fbshipit-source-id: 56f863b759c016d7f23523f06fda3aa5bba22357
2017-02-08 15:04:24 -08:00
e74184f679 Make THCCachingHostAllocator less aggressive.
In cases where copyAsync is a large percentage of the work,
processing events in recordEvent can cause a large bottleneck.

Here, we relax the constraint that we reclaim blocks as fast as possible
(i.e. in copyAync); instead, we only check that a block can be re-allocated
in malloc and free.
2017-02-08 14:44:24 -08:00
e34e1b1b7b added updated loss function, changed cv interp filters to AREA
Summary:
updated training for breaking change of loss_scale.
Noticed that for large downscale factors opencv INTER_AREA did a better job avoiding aliasing so changed to this filter

Reviewed By: seansnyder

Differential Revision: D4528909

fbshipit-source-id: 692894812701854dd5eb8da932505f465fed3590
2017-02-08 14:20:59 -08:00
3884d36176 Add unsqueeze to THC 2017-02-08 13:49:32 -08:00
e6a18d2e9a Added TransposeOp Inference
Summary: TransposeOp shape inference is now implemented

Differential Revision: D4517155

fbshipit-source-id: fb2b11c27231043f87a4c128b0eb3cbb60ab2c0c
2017-02-08 10:29:31 -08:00
e7c6886a00 Add unsqueeze1d to TH
Unsqueeze inserts a singleton dimension. Unlike view, it doesn't require
the tensor to be contiguous.
2017-02-08 09:52:50 -08:00
024d1e2678 Merge pull request #69 from cwhipkey/master
Qualify nullptr_t with std::
2017-02-08 09:17:50 -08:00
ed8e92f63d Expose rawSet and rawResize as resizeNd and setStorageNd 2017-02-08 09:00:22 -08:00
fb97df5d65 Expose rawSet and rawResize as resizeNd and setStorageNd
These methods are useful from C because they don't require constructing
THLongStorages to wrap the sizes and strides, which can lead to leaked
memory in case of an error. Instead the sizes and strides can be
represented on the stack using standard C long arrays.
2017-02-08 08:56:04 -08:00
e9b05c71b4 Use THCTensor rather than THCudaTensor in THCUNN.h definition of
GatedLinearUnit.
2017-02-08 07:54:10 -08:00
5eab428294 Qualify nullptr_t with std::. 2017-02-08 07:06:31 -08:00
849fc7ba68 check that parameter is int
Summary: One trainer passed (10,) as the max_buffer_size parameter, causing the internal queue to grow out of bounds as qsize == (10,) never was true. This adds assertion to the type of the parameter.

Reviewed By: prigoyal

Differential Revision: D4527649

fbshipit-source-id: 492a824700b8fc69c484b80773b1f1f5aee39071
2017-02-08 03:04:04 -08:00
41007ce07b More comprehensive benchmark tool
Summary:
* Use Eigen for reduction math so that processor extensions are properly
  used and timing is as close as possible to real use cases.
* Optionally run over multiple data sizes in sequence.
* Maintain all timing samples so we can report latency percentiles.

Example of benchmark output (2 nodes, tcp transport, chunked allreduce):

```
   elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
          1         70        150        210        262      14880
          2         95        149        211        276      10624
          5         92        146        209        287      11573
         10         89        149        212        269      14141
         20         74        151        216        264      14254
         50         90        149        211        279      15236
        100         94        149        202        264      12390
        200         71        129        166        234      16343
        500         74        130        167        224      19473
       1000         93        140        171        227      12151
       2000        100        143        167        209      13873
       5000         97        156        199        258       9888
      10000        107        177        233        310      13549
      20000        132        197        252        312       9518
      50000        181        276        414        616       4514
     100000        273        534        687       1231       2958
     200000        405        745       1165       2333       1679
     500000        805       1321       2490       3787        704
    1000000       2040       2902       3433       6214        693
    2000000       3337       4006       5295      12809        177
    5000000      10321      12529      15760      20903         98
```

Differential Revision: D4500374

fbshipit-source-id: 1999142d6a5b235d32886354986cdee17edc9fee
2017-02-07 14:48:10 -08:00
7137c565d7 Add all-to-one barrier
Summary:
This enables a real RTT measurement, since it's not possible
for peers to 'pre-fill' the notification buffers as is the case for
the all-to-all barrier.

Differential Revision: D4523543

fbshipit-source-id: 3f6467cdc66b1062ada92deed581e9360003d629
2017-02-07 14:48:09 -08:00
1709664a43 Add debug mode to transport buffer
Summary: Useful for debugging algorithm interactions.

Differential Revision: D4523522

fbshipit-source-id: ae3652f935774570ad29ff894f42e1634f22c806
2017-02-07 14:48:09 -08:00
6a03641cde Add num_iters to RunNet()
Summary:
Running RunNet() in python in a loop can be a performance issue if the python code is doing a lot of other processing, such as data input, because python's Global Interpreter lock (GIL) will prevent the RunNet() to be called. This can easily be fixed by making RunNet() run multiple iterations inside the C++ land. (Another way to accomplish the same thing is to use Caffe2's "execution plans", but that requires more setup).

+ fixed timing reporting in my OC workflow
+ improved one error log in data_workers.py

Sorry for piggypagging those small changes, but landing diffs currently is slow...

Reviewed By: rpenggithub

Differential Revision: D4523575

fbshipit-source-id: 039a647576efad5dd9afda74df478ac22b43c103
2017-02-07 14:16:14 -08:00
274ac2b590 Add cmake guard for python, build for tegra X1
Summary:
In short: cmake is lovely.
Closes https://github.com/caffe2/caffe2/pull/131

Differential Revision: D4517234

Pulled By: Yangqing

fbshipit-source-id: 1117878393f8fe7d6bebbc4a06a3c37b734f3222
2017-02-07 13:17:50 -08:00
5303634ebf Use MDB_NOLOCK.
Summary:
- Do not lock LMDB.
- This avoids failure when multiple readers try to read the same LMDB.
- This also can cause a race if a process tries to write into the LMDB that is being read by another process. Because this commit removes the locking mechanism.
- Note that we already use MDB_RDONLY when reading LMDB.
- It seems that LMDB does not provide any method of locking the database to avoid writes while allowing reads.
Closes https://github.com/caffe2/caffe2/pull/130

Differential Revision: D4512220

Pulled By: Yangqing

fbshipit-source-id: 45df849efa339601291aea6d0ed5ac74e097273b
2017-02-07 13:17:50 -08:00
535e0e486b Add model graph to dper_example
Summary:
Just the first version displays forward part of the training net. I want to refactor local/distributed code to share graph initialization and then visualize all nets individually.

Graphs don't look pretty because of a lot of DotProducts, we need to refactor it.

Reviewed By: xianjiec

Differential Revision: D4514479

fbshipit-source-id: 156bb07c62118b15022c87f197b5e378a7ef3b9f
2017-02-07 13:03:54 -08:00
1f1aafaebe Implement shape inference function for AccumulateOp
Summary: Implemented the sharp inference function for AccmulateOp. The output shape and type should be same as the input.

Differential Revision: D4518812

fbshipit-source-id: 11fc7ec4fad1fe3049c5a35d13c371627f9e3d11
2017-02-07 12:01:12 -08:00
c115646d71 Use fbcollective
Summary:
Update data parallel model to default to using fbcollective.

Update broadcast op to correctly handle Tensor<long>.

Differential Revision: D4508029

fbshipit-source-id: 7b8d17223e25b3e1098ee3f2a08af61af140729e
2017-02-07 10:48:33 -08:00
7926324385 Corrected parameter typo in Adam docstring (#697) 2017-02-07 19:00:10 +01:00
1527b37c26 Fixed typo and rendering of some equations (#693)
* Fixed typo and rendering of some equations

* Few more fixes to MSELoss docs

* Cleaning up whitespace to make pep8 happy
2017-02-07 18:59:27 +01:00
de4659659b The RNNCell's example can not run correctly 2017-02-07 18:58:19 +01:00
b2532d2794 More logging if unable to find address to bind to
Summary:
This should help in debugging test failures on continuous
integration hosts.

Part of this change is to make the address family to use configurable,
so the user can force the library to use either IPv4 or IPv6, instead
of picking whatever we see first.

Differential Revision: D4515802

fbshipit-source-id: 8834cece2ff819c8acad81fa2d76c3ed94f06158
2017-02-07 00:42:45 -08:00
50213705d4 Allow specifying max buffer size. Smaller initial size.
Summary:
I recently encountered out-of-memory errors on my OC workflow. This was because the internal queue for buffering image patches was too large. Total memory use was:
  image size = 227 x 227 x 3 x 4
  total mem = image size x queuesize (500) x num gpus x everstore-worker batch (128) > 300 gigs.

Reducing the batch size to 100 should fix this. Also can now specify as a parameter.

Reviewed By: rpenggithub

Differential Revision: D4519956

fbshipit-source-id: 781697e620431ce7053534e683047bb6e7257b22
2017-02-06 22:01:56 -08:00
3c90356499 Add check for num_shards when using distributed training
Summary:
If num_shards = 1 and distributed training is on, then ring reduce fails when it looks for left pair to exchange information.
I also used the opportunity to do a small fix in my data loader benchmark

Differential Revision: D4513545

fbshipit-source-id: 7d3115b871a39b8ce7b55553394b607d16e08b74
2017-02-06 20:19:19 -08:00
a386fe8b6a LogOP implementation
Summary: Element-wise log operation for a Tensor

Reviewed By: dzhulgakov

Differential Revision: D4519090

fbshipit-source-id: 68b73efa0ef268426b5aece77c8137000a73d165
2017-02-06 20:19:19 -08:00
a96a8c8336 Static build support + Query CUDA driver, runtime versions (#695) 2017-02-07 08:34:20 +05:30
078a8d10de Move png image to net_drawer and Flow example
Summary:
Making drawing a bit easier

Also adds a Flow example to check that PNG images are nicely rendered in lists.

Reviewed By: kennyhorror

Differential Revision: D4514470

fbshipit-source-id: 35189c4543c31a351c1dbfe804ce25ae14a3a98b
2017-02-06 18:46:48 -08:00
17151ca14f Debug/Analysis tools for Jobs/ExecutionSteps
Summary:
Introduces 2 utitilies:
- ##print_obj##: Prints the whole Job in a nice way -- each op call takes one single line and nets are inlined for much better readability. Loops and parallel steps are easy to read.
- ##analyse_obj##: Goes through a Job and checks 2 things:
    - that there will be no undefined blob errors at execution.
    - no blob of same name will be created by parallel execution steps

Reviewed By: dzhulgakov

Differential Revision: D4142381

fbshipit-source-id: 61bf3398c22e9947493e99145ce2bfc2646830a6
2017-02-06 17:31:20 -08:00
280718b40c Allow non-batched initial recurrent states for RecurrentNetworkOp
Summary: title

Reviewed By: salexspb

Differential Revision: D4493728

fbshipit-source-id: a9ba25bd325b413ed15c35754afb9ed562b1a60c
2017-02-06 15:01:36 -08:00
691aa19b88 Add code for 'view' to THC 2017-02-06 14:04:04 -08:00
947e5feb4d Trainer support for mobile ranking
Summary:
We want to train models with user sequence data for mobile side ranking.

The operators are for preprocessing the sequence based data. They read in a sequence with a batch and convert the examples with different method.

I also add a new loader for connecting the operator to existing trainers

Differential Revision: D4485411

fbshipit-source-id: 0cf17206704995f2ce079e1594607bea70b1ed0c
2017-02-06 14:03:44 -08:00
6b07dc9e22 Add code for 'view' to TH 2017-02-06 14:00:48 -08:00
75e62924e3 schema.Struct.__add__
Summary: makes life a bit easier

Reviewed By: xianjiec

Differential Revision: D4514640

fbshipit-source-id: b39f9cb05d31d2e5fa957bc072cf18eda13cff89
2017-02-06 13:47:58 -08:00
3049bc1fed Fix data parallel model code doc
Summary: Thanks rpenggithub

Reviewed By: rpenggithub

Differential Revision: D4510933

fbshipit-source-id: 25e33ac0ba5a5143fc5bbe1abb615d7512c7ef41
2017-02-06 12:33:33 -08:00
3bb8755067 Use multi_reader directly
Summary: This makes sure dper_example is compatible with the new way of defining checkpoint epochs. See D4499320.

Reviewed By: xianjiec

Differential Revision: D4511618

fbshipit-source-id: f5188010cdefe3739f87f6049d1ea6aee765c514
2017-02-06 09:59:20 -08:00
c4afd618c4 Add USDT for operator execution
Summary: Import relevant headers from folly.

Reviewed By: azzolini

Differential Revision: D4342793

fbshipit-source-id: 77471e1afd70e399805e4c46e5320ccc3e39d69c
2017-02-06 08:44:42 -08:00
8aa259b52b review comments from gchanan 2017-02-06 11:08:23 +00:00
06591ad414 Fixed task 15844370: [Caffe2/Bootcamp] Make top-1 accuracy faster
Summary:
Per the task's request, added top_k == 1 branch to specially handle the cases with top-1 accuracy.
In addition, I made slight code refinement: moving the declaration of vector Xdata_pairs out of the for loop to avoid the cost of vector's constructor.

Differential Revision: D4505983

fbshipit-source-id: 5671eaca4aac3900c69dfb54d664c2d617960b4b
2017-02-04 07:59:23 -08:00
ac9312e9f8 Bugfix/rowconv (#1126) 2017-02-04 20:37:45 +05:30
33c0e5619b Add Task.REPORT_NET attribute
Summary: This allows to have a task-local report net before the Task is created. To be used in global counter (diff soon)

Reviewed By: dzhulgakov

Differential Revision: D4497771

fbshipit-source-id: 24ec7c8e95466abbd83fbea79b58717d81201857
2017-02-03 18:44:50 -08:00
91a17b702b half<->float conversion cleanup (#901)
* half<->float conversion cleanup
2017-02-04 07:30:13 +05:30
c54597e0b2 std::move fixes 2017-02-03 21:31:03 +01:00
e63003d5a0 Fix race in FileStoreHandler
Summary:
It was possible for a set and a get to race such that the get
would return an empty string, if the file for the key was created but
not yet written to. This change updates the FileStoreHandler to first
write to a temporary file and then atomically rename(2) the file to
its final path. This removes the described race condition.

This change also replaces the poor filename generation routine with
using the 128-bit MurmurHash of a key.

Differential Revision: D4502154

fbshipit-source-id: f2abc78b8bad68c06ad2f18a078935826e431f7a
2017-02-03 09:59:45 -08:00
1c7886701e lr_scale to loss_scale
Summary:
As per discussion in https://www.prod.facebook.com/groups/184236721951559/permalink/354591931582703/, KaimingHe pointed out that scaling LR is not same as scaling Loss, since LR scaling will affect the weight decay (which is implemented by modifying the gradient, which thus is not yet correctly 'averaged'). Actually prigoyal tried to convince me earlier that loss scaling is the way to go, but I was then not convinved :/.

So this diff removes the LR scaling parameter passed by data_parallel_model and instead passes a loss_scale parameter to the model creation function. Unfortunately, this will break all existing code that uses the data parallel model. But that is not only a bad thing, since it will bring awareness to this change. I will inform in the FB groups about this.

In this diff I modified all my models to work correctly.

Reviewed By: Yangqing

Differential Revision: D4507002

fbshipit-source-id: 16c7221663282f71a1b754b34de0c8ccd5c2ca90
2017-02-03 07:44:40 -08:00
b2472eab3a Improve dagnet chain computation by pruning redundant dependencies
Summary:
We have noticed that the number of chains computed is usually much larger than necessary, when there is a backward pass. For example having a network of 5 FCs with gradient operators (but no parameter updates) should yield only one chain, but instead over 20 were created.  After adding parameter updates, the forward pass still should remain one chain, while the backward pass will be splintered.

Analysis showed that the problem was the dependices from forward ops to the gradient computation. But these are redundant since the gradient op is already dependent from the op via the full path over ops. Example:

  fc1     -> fc2   --->   fc3  --> loss
    |          |             |          |
  fc1grad <- fc2grad    <- fc3grad <-

Here fc1 and fc1 grad have a direct dependency, but indirect dependency via fc2->fc3->[...]->fc1grad already covers that dependency.

To fix this, I added a pruning step prior to the chain computation. The chain computation is done on the pruned tree, but I do not modify the runtime chains for safety.

Pruning is based on  following logic:
  - if one of my direct parents is ancestor via an another traversar, I can remove the direct dependency

Pruning is extremely fast, linear in the number of dependencies.

Reviewed By: dzhulgakov

Differential Revision: D4500293

fbshipit-source-id: 0994ae6775c53378ea1e0074365cef041764a1b4
2017-02-03 07:44:40 -08:00
dcefc74a0c Shape and Type Inference Part1
Summary:
This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations.

Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema.

I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way.

Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test.

Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around.

Reviewed By: dzhulgakov

Differential Revision: D4436818

fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c
2017-02-02 22:29:22 -08:00
a9785bba44 cuda implementation of Gated Linear Unit, fixed issues with genericization 2017-02-02 21:38:25 -08:00
5837b21691 support subtask in mtml
Summary:
Required by feed ranking: https://fb.quip.com/N4IuAIgda8Pe
Each task might have multi-subtasks. Each subtask has dedicated mlp layers.

Reviewed By: xianjiec

Differential Revision: D4451609

fbshipit-source-id: 3dad48e6a7cce1bb103d93ec205ff6d2333659ea
2017-02-02 17:59:30 -08:00
8e1c513fb5 Make build_host_protoc more robust to weird system settings
Summary: If the PATH doesn't include cmake (such as when android studio wipes all the environment variables), this will still work.

Reviewed By: Yangqing

Differential Revision: D4504653

fbshipit-source-id: 56a8854e3daf6ee1f5b1cbeb83ca175a007dad12
2017-02-02 15:44:32 -08:00
2ce3cfefe1 Char-RNN Tutorial
Summary:
This learns Shakespeare and then generates samples one character at a time. We want this to be an example of using our LSTM and RNNs in general.

Now it takes 4ms to run the training net on current parameters (with batch size = 1). I don't have data on how much each operator takes yet. But overal python loop doesn't seem to influence much - with 1000 fake iterations in run_net it took 4s for each iteration as expected.

Future work:

* fixing convergence for batching
* profiling on operator level
* trying it out with GPUs
* benchmarking against  existing char-rnn implementations
* stacking lstms (one lstm is different from two, one needs to take care of scoping)

Reviewed By: urikz

Differential Revision: D4430612

fbshipit-source-id: b36644fed9844683f670717d57f8527c25ad285c
2017-02-02 15:44:32 -08:00
d7e85bf38e Fix ops.stop_if() from inside processors
Summary: stop_if() was not being honored in ProcessingReader.

Reviewed By: dzhulgakov

Differential Revision: D4497784

fbshipit-source-id: 1c967c6252f832149800796e2c26aadf10b74850
2017-02-02 15:14:27 -08:00
000c53a7b1 AtomicCounter to return previous value on Reset.
Summary: This allows to save the previous value of the counter and send it upstream without losing counts.

Reviewed By: kennyhorror

Differential Revision: D4497854

fbshipit-source-id: 28a7ad0ff1020bde26f78b1f59614b094d1e1881
2017-02-02 14:59:30 -08:00
d93b9eeae2 Fix NetBuilder's task_init
Summary: The net was being added to the task body by mistake. Also, adds local_init and local_exit functionality.

Reviewed By: dzhulgakov

Differential Revision: D4497794

fbshipit-source-id: 4d9dfb48a277ccfa204f1e74886abba5d44c61f8
2017-02-02 14:59:30 -08:00
d8dff5853e Add numSample field for preComputing
Summary: For customers like Ads, Feeds, MarketPlace, their training data size is super large. It is unnecessary and costly to go over all the data to compute meta information. In this diff, numSample option is added in preCompute, so users have control over how many samples they want to use when computing meta information.

Differential Revision: D4492399

fbshipit-source-id: 7199381d226ee6300a959fc5e116d39984d199fc
2017-02-02 13:59:30 -08:00
115b5e0c5c Configurable hostname to bind to for tcp transport
Summary:
The unit tests using the tcp transport should bind to
localhost instead of hostname(2).

Differential Revision: D4501851

fbshipit-source-id: 43db860c9b96d5d64801d1c6af2bf25e6759b4af
2017-02-02 11:59:32 -08:00
d6d19d6dca Assert on low side as well
Summary: One model was passing -1s in the label blob, causing illegal memory access when computing the label-cross entropy. Improving the assertion causes it to fail properly.

Reviewed By: prigoyal

Differential Revision: D4491848

fbshipit-source-id: 5c48e43b0a8928cac70e939d69d23c94c07511b9
2017-02-02 09:59:27 -08:00
f401bf8928 dynamic creation of streams and cublas_handles, support multiple streams per thread per gpu
Summary:
Currently CUDAContext only supports one cuda stream per gpu per thread. But as per my investigation, it is much better to use one CPU thread to control all streams for one GPU. To make this possible, this ground work is necessary: this diff defines a stream id for cuda context that is used to index to streams for that gpu for that thread (the streams are handled by a thread-local class).

This diff also changes the initialization: before we created cuda streams for all gpus and for all threads, even if they would be never used. Now streams are created only when needed.

This adds a small overhead to context.cuda_stream(), but I doubt that to have any significance. Instread, this diff will reduce memory usage on GPU side slightly.

Reviewed By: Yangqing

Differential Revision: D4492380

fbshipit-source-id: 498555e58d75217d43891e1bcad6d86051d376ce
2017-02-02 09:44:30 -08:00
833b8cbc7a Remove unused code from module 2017-02-02 17:20:11 +01:00
75aeb16e05 Merge commit '72089c9c36c6b880c695baf732cd04329d72c098' 2017-02-01 22:00:42 -08:00
4dd297d261 Add nnpack 2017-02-01 21:45:18 -08:00
3df81ff18c Add ibverbs transport for fbcollective
Summary: TSIA

Reviewed By: Yangqing

Differential Revision: D4498867

fbshipit-source-id: 6384ae4b64e4d68e11d52e1ae8ab661216bce256
2017-02-01 21:29:56 -08:00
4ed85d1d00 modified behavior of input_op to convert to desired channel depth if source is different
Summary:
The color_ flag to image_input_op now indicates the desired number of output channels. If the source
DB has a different number of channels then color to grayscale or vice-versa is done.

Reviewed By: Yangqing

Differential Revision: D4498455

fbshipit-source-id: da8c39eccd06b9158f320a05663658e502905ae5
2017-02-01 21:29:56 -08:00
fc354a0d6e Revert "cuda implementation of Gated Linear Unit, fixed issues with genericization" 2017-02-02 10:50:47 +05:30
262611fcd3 Merge pull request #430 from huihuifan/newCudaGLU
cuda implementation of Gated Linear Unit, fixed issues with genericization
2017-02-02 08:16:35 +05:30
b8a34f3033 Small fixups:
1) Add return after THError for completeness.
2) Fix brace formatting
2017-02-01 15:46:19 -08:00
10bb6bb9b8 Fix function names in error messages 2017-02-01 15:21:57 -08:00
3c9ef69c37 Fix THCTensor::isSparse 2017-02-01 14:51:06 -08:00
dee987d6ee use pseudo-fp16 2017-02-01 23:48:09 +01:00
138f254ec1 Support sparse tensors in THPP (#667) 2017-02-01 17:34:50 -05:00
c7c8aaa7f0 Add ModuleList and ParameterList to nn 2017-02-01 23:26:31 +01:00
77fd7c2b6f Make translator work as command line tool
Summary: The initial implementation wasn't working quite right (no const fill of an empty external input)

Reviewed By: viswanathgs

Differential Revision: D4490569

fbshipit-source-id: 1b2a4f612efb3b2685edfe6c683571dd9d01aa4f
2017-02-01 13:14:26 -08:00
d0db624e02 Add W503 to PEP8 ignore list (#646) 2017-02-01 15:57:09 -05:00
e3e7b76310 Rename all normal and log_normal args to std 2017-02-01 21:48:11 +01:00
dad02bceb9 Remove duplicated line in cwrap 2017-02-01 21:48:11 +01:00
b195285879 Improve CUDA detection in THPP 2017-02-01 21:48:11 +01:00
8f3da5b51d set_index -> _set_index 2017-02-01 21:48:11 +01:00
825e919eb8 Add torch.unbind 2017-02-01 21:48:11 +01:00
acb0ce8885 Add LongTensor indexing support 2017-02-01 21:48:11 +01:00
72089c9c36 Update THHalf.c 2017-02-01 11:53:29 -08:00
cf2f158fec Remove erroneous proprietary license header
This change was approved by NVIDIA Legal, and I am authorized to make the change on behalf of the company.
2017-02-01 11:43:44 -08:00
2397b6a6f2 Add CUDA support for Safe{Enqueue,Dequeue}BlobsOps
Summary: Add support for "safe" versions of enqueue and dequeue. I'm not sure if using `math::Set<bool, Context>` is the best context independent approach for setting the status.

Differential Revision: D4398633

fbshipit-source-id: 7c88c8e11acfe36fd3d94f17dbf68ce558eb6df1
2017-02-01 09:44:37 -08:00
41ddc2a786 VolumetricFractionalMaxPooling like Spatial... 2017-02-01 12:01:09 +00:00
e4886f6589 VolumetricFractionalMaxPooling like spatial 2017-02-01 11:52:49 +00:00
0a3a3de574 Utility op to join tensor matrices into a row strings
Summary:
Takes a 2D tensor of floats, and converts each row into a comma delimited
string. vigneshr ran into a limitation where logging features to hive wasn't
possible without this since our APIs only allow logging strings.

Differential Revision: D4486151

fbshipit-source-id: 2d229290819e2e7ca3dc6f93846433da8b02a41d
2017-01-31 22:44:22 -08:00
6470b5bd21 Add test for Embedding with sparse=True (#663) 2017-02-01 09:54:42 +05:30
tvn
44196955e2 ByteTensor should be unsigned (#664)
ByteTensor should be unsigned
2017-01-31 21:43:39 -05:00
79c04d32dc add an option to use a resnet network instead of alexnet
Summary: add an option to use a resnet network instead of alexnet. Modified the resnet.create_resnet50 function slightly to allow specifying different kernel/stride parameters so we can adapt resnet to our image size.

Differential Revision: D4472535

fbshipit-source-id: ed06acf52f6425a1e04d047548eb3c70388d74aa
2017-01-31 16:59:30 -08:00
8ef37ff8fd Add fbcollective
Summary:
Library for collective communication.

Includes Caffe2 ops that wrap reduction/broadcast algorithms.

Reviewed By: Yangqing

Differential Revision: D4439018

fbshipit-source-id: 4bc3652d07953b0bcf4c4c08574a85f1098f683f
2017-01-31 14:29:27 -08:00
b7fa6b2a8b remove recurrent_inputs in a favor of recurrent_input_ids
Summary:
I have forgotten to remove this one. The rest of indexing
instead of string names is comming after  D4446813 lands as scratches
aren't inputs or outputs and thus can't be indexed.

Reviewed By: urikz

Differential Revision: D4465748

fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290
2017-01-31 13:14:33 -08:00
f08ec1394d Fix bug with inplace TH(CU)NN
Also, remove unnecessary zero_() calls
2017-01-31 21:00:49 +01:00
f8fb25e0a2 Add generic bindings to THNN and THCUNN (#645)
Adds bindings using thpp::Tensor to THNN and THCUNN. This allows calling
into those APIs without knowing the concrete types of the tensor
arguments.
2017-01-31 13:23:02 -05:00
519a23e767 Use chrono library instead of sys/time.h to get the the time from epoc
Summary: Remove the dependency on sys/time.h, and use c++11 feature chrono library, which is more portable.

Reviewed By: Yangqing

Differential Revision: D4486569

fbshipit-source-id: 86be58c6e9bc410e726a4799bc4d2be86fdd1dd4
2017-01-31 09:14:23 -08:00
6a0c66752f Fix documentation and argument name for Tensor.normal_(mean, stddev) (#652) 2017-01-31 11:55:39 -05:00
562a4c2dbf fixed input op for grayscale images
Summary:
grayscale images were not being handled correctly by the image input op in the CPU path. There was
a coercion of the grayscale image to color which strided through the grayscale image 3 pixels at a time

Reviewed By: Yangqing

Differential Revision: D4486356

fbshipit-source-id: 482fbfe211ecdc107e55692a4cf0329e174c8e4a
2017-01-31 01:14:22 -08:00
a1bd4efb08 readme: add guidance on disabling CUDA (#655) 2017-01-31 14:05:51 +05:30
d019ec793c improve fluky test
Summary: On some inputs TestWarden was failing

Reviewed By: Yangqing

Differential Revision: D4487293

fbshipit-source-id: 3da4b310a619c2b57f033b2dd7727f71403bfd68
2017-01-30 22:14:27 -08:00
debd256177 Fix for gradient propagation for initial recurrent state for RecurrentNetwork
Summary: looks like we don't a good job with initial recurrent input gradients yet. Here is some fix, but gradient doesn't check yet. The shape is correct now though

Reviewed By: salexspb

Differential Revision: D4475447

fbshipit-source-id: 280f1f59f19e487fd0dce0d440609c50ddce294a
2017-01-30 18:59:32 -08:00
b43ce05268 Refactor parts of utils.h (#648)
Moves THPObjectPtr into a separate header, so that it can be included
independently. Currently, utils.h requries all of THP.h. Also adds RAII
structs for acquiring and releasing the GIL.
2017-01-30 21:16:28 -05:00
80e56cfda9 Merge commit 'dc9a5b7d2fbcf21268b524b9da5ae38a74214a59' 2017-01-30 17:58:05 -08:00
24701fc5a7 Merge commit '03dcf8a83bb009ecfdd8f27c4d9a6db40829b690' 2017-01-30 17:57:20 -08:00
f78a266d99 Merge commit '368cbe615d0a7bdaadddcb3bd390abcd4cc17b91' 2017-01-30 17:56:37 -08:00
f096fb6859 adding cudnn V6 support (#515) 2017-01-31 02:01:37 +01:00
a3e11d606b Fix linter errors 2017-01-31 01:58:09 +01:00
79232c24e2 Fixes after rebase 2017-01-31 01:58:09 +01:00
15d9d499ab Remove ZMQ dependency from compilation files 2017-01-31 01:58:09 +01:00
962084c8e8 Add Data Channel receive from any source (#52) 2017-01-31 01:58:09 +01:00
7518b1eefb Introduce Scalar for easier send/receive types through DataChannel 2017-01-31 01:58:09 +01:00
8215d7a4ba Implement TH_API functions from the set 2 (#49) 2017-01-31 01:58:09 +01:00
5aaa220d84 Thd functions v3 (#46) 2017-01-31 01:58:09 +01:00
12c16ab9bc Remaining storage functions implemented 2017-01-31 01:58:09 +01:00
76520512e7 DataChannel tests rewrite (#42); DataChannel isend and irecv implementation (#44) 2017-01-31 01:58:09 +01:00
66de965882 Replace ZeroMQ (#41) 2017-01-31 01:58:09 +01:00
10d32fb0b7 Fix DataChannel tests failure (#43)
Tests failed due to accessing reference which could be invalid.
2017-01-31 01:58:09 +01:00
e72c9b6e4a Storage constructors implemented (#40) 2017-01-31 01:58:09 +01:00
ac1f68127a Add barrier, scatter, gather and allGather implementations + groups (#34) 2017-01-31 01:58:09 +01:00
60d1852c7b Major improvements to master-worker mode
* Fixed all undefined symbol errors
* Implemented storage interface and THStorage class
* RPC improvements
* Code refactor
2017-01-31 01:58:09 +01:00
d53eb521fc Add missing headers. 2017-01-31 01:58:09 +01:00
9808932f10 Refactor RPC and change TensorType to Type 2017-01-31 01:58:09 +01:00
ea876eb6d5 Add initial bindings for master-worker mode 2017-01-31 01:58:09 +01:00
0a45864866 Add THDStorage and improve master-worker mode implementation 2017-01-31 01:58:09 +01:00
2560b39796 Merge TensorTypeTraits.hpp with TensorTraits.hpp 2017-01-31 01:58:09 +01:00
21afa4c88b Worker handling for constructors + destructor 2017-01-31 01:58:09 +01:00
9fc3c5e4d2 THDTensor constructors implemented + some minor fixes 2017-01-31 01:58:09 +01:00
3e3501c98d Integration tests of the THD Python interface (#28) 2017-01-31 01:58:09 +01:00
5e6fcd02b5 Implement data channel groups (#25) 2017-01-31 01:58:09 +01:00
d46ebcfadf Fix broadcast and reduce implementations
Due to bad rank mapping broadcast and reduce were connecting
wrong processes what resulted in errors or not received/sent tensors.

 * Introduced new mapping method to solve this problem.
 * Added and improved tests for this cases.
2017-01-31 01:58:09 +01:00
41480c8cf2 Data channel maintenance 2017-01-31 01:58:09 +01:00
236890d902 Fix transitive library dependencies in CMake 2017-01-31 01:58:09 +01:00
55632d81d2 Add Python wrappers for process group mode 2017-01-31 01:58:09 +01:00
0b276d622e Add reduce and allReduce implementations (#15) 2017-01-31 01:58:09 +01:00
c81491b37d Preserve directory structure when installing headers 2017-01-31 01:58:09 +01:00
42e189425f Detect ZMQ libs and headers in CMake 2017-01-31 01:58:09 +01:00
3cfa0d7199 Expose C API for process group mode 2017-01-31 01:58:09 +01:00
7c9e088661 Reorganize THD directory structure 2017-01-31 01:58:09 +01:00
e78aa4bb84 Implement CommandChannel with ZMQ. 2017-01-31 01:58:09 +01:00
f8e94d0d8b Implement DataChannel (MPI and TCP) (#8) 2017-01-31 01:58:09 +01:00
ebe6f40fce RPC message packing and unpacking implemented 2017-01-31 01:58:09 +01:00
5fb37efb46 Use #pragma once instead of defines 2017-01-31 01:58:09 +01:00
4f47855873 Style improvements 2017-01-31 01:58:09 +01:00
52ae6f682f Add initial version of tensor wrappers 2017-01-31 01:58:09 +01:00
c35f58f97b Template for THD implementation 2017-01-31 01:58:09 +01:00
659b2f3154 Add more autograd functions 2017-01-31 00:39:34 +01:00
5ea05cfb96 Return indices from Variable sort and topk 2017-01-31 00:39:34 +01:00
0700e05e68 Disallow duplicate field names in Struct
Summary: title.

Differential Revision: D4482958

fbshipit-source-id: a732f6b5d862b440a4856251ad68ecd98f60e8d1
2017-01-30 14:44:28 -08:00
dc9a5b7d2f Fix memory leak in SpatialMaxUnpooling 2017-01-30 23:23:07 +01:00
1d3834eeb2 Nodes to support resource requirements and outputs
Summary: See distributed.py for example of usage

Reviewed By: xianjiec

Differential Revision: D4467723

fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd
2017-01-30 11:29:25 -08:00
8553bd3f68 Ensure we are not using Eigen LGPL code, and build on raspbian.
Summary:
Turns out that building on raspbian is easy as a cake for caffe2 - cmake is awesome.
Closes https://github.com/caffe2/caffe2/pull/112

Differential Revision: D4480985

Pulled By: Yangqing

fbshipit-source-id: 5dbe5e1e71d8680dea7a5ec8a9ce7fbe6aa5270a
2017-01-30 09:44:27 -08:00
f7ab5a128a Delete extra bracket in RNNCellBase.__repr__. (#637)
This extra bracket causes a ValueError when trying to print a Module that uses RNNCellBase or any of its subclasses.
2017-01-29 23:21:24 -05:00
368cbe615d Add Ubuntu 16.04 lib paths in CMake 2017-01-30 01:16:02 +01:00
d4c9a3782b billinear -> bilinear, docs for upsampling, improved docs for Unpooling, pep8 tests fix (#617)
* billinear -> bilinear, docs for upsampling, improved docs for Unpooling, pep8 tests fix
2017-01-30 05:08:48 +05:30
172dca5e8b Fix bug in cat (non-contiguous first input) 2017-01-29 21:25:53 +01:00
818bf0c408 Compile with asserts by default 2017-01-29 21:21:59 +01:00
03dcf8a83b Compile with asserts on by default 2017-01-29 21:18:54 +01:00
604f607fd1 Add asserts in index* functions 2017-01-29 21:18:43 +01:00
956d946c25 Default initial hidden states for recurrent layers (#605)
Fixes #434
2017-01-29 12:38:56 +01:00
2c6391be0a remove unused includes in fbcode (skipping #if, new default mode)
Summary:
This solves most include warnings as seen in Phabricator (no header files, no "packing" system headers, new default mode where more user headers are removed).

We cowardly skip files containing #if for now.

Generated by
```
rm -f /tmp/ffmr-diff/* &&
cd fbcode &&
(foundation/scripts/ls-cpp-dirs | grep -v '^\(\.\.\|external/\|.*/external\|folly/|watchman/\)' |
xargs ffmr -o /tmp/ffmr-diff codegraph/scripts/ffmr/analyze_includes_no_headers_no_packing_skipping_if.sh) &&
(cat /tmp/ffmr-diff/*.diff | patch -p2) &&
hg commit -m foo &&
cd .. &&
arc amend --yes --revision D4414676 && arc diff --nolint --nounit --excuse refactoring --prepare --big-diff -m 'something'
```

folly and watchman are in separate diffs.

Reviewed By: meyering

Differential Revision: D4414676

fbshipit-source-id: 75e2e11f4fac8a5f8071a1bafcc4ddc355fd6f4e
2017-01-29 01:34:03 -08:00
970caaa621 Exclude sphinx_rtd_theme from pep8 2017-01-28 23:37:39 -05:00
00a5980cdf Improve RNN doc formatting 2017-01-28 23:37:39 -05:00
e24eee04f0 Link THC to THPP 2017-01-28 23:37:39 -05:00
f1b3af4ee2 Add more bernoulli options in cwrap 2017-01-28 23:37:39 -05:00
fb2d28f477 remove circular references in NestedIOFunction 2017-01-28 23:30:06 +01:00
f64bc7d2a7 update to eigen 3.3.2 2017-01-28 13:37:09 -08:00
3a82b33f84 Use protobuf's own cmake scripts and add travis for ios
Summary: Closes https://github.com/caffe2/caffe2/pull/110

Differential Revision: D4475170

Pulled By: Yangqing

fbshipit-source-id: 5964db04186619ac563f516cb202c5e2ba543403
2017-01-28 13:29:32 -08:00
3a704ff725 Fix legacy load_lua for SpatialConvolution (#608)
* fix legacy load_lua for conv2d

* fix pep8
2017-01-28 20:19:18 +01:00
0180e638e5 Remove unnecessary zero_() calls in cuDNN RNN 2017-01-28 14:36:57 +01:00
95c6ae04fb Fix non-contiguous grad handling in cuDNN RNN 2017-01-28 14:36:57 +01:00
14a5b35805 Snapshot -> Checkpoint
Summary: As per kennyhorror request.

Reviewed By: kennyhorror

Differential Revision: D4473177

fbshipit-source-id: 6cab6ccf247b09aab8f6f056c807bd3ed27ee6a5
2017-01-27 22:29:32 -08:00
86fb25cefa Rely on embedding size in split
Summary: As desc.

Differential Revision: D4471823

fbshipit-source-id: 2685c64c22556da1749b3e3e6b21a684a7231e7b
2017-01-27 19:44:31 -08:00
27c4c6e0af Merge commit '6ee77b4edd1552d3a9a2e5389ffc351e513a8089' 2017-01-27 17:29:07 -08:00
da17414b3f Merge commit '343d65db91c2419843d36aed5467c2d1374108bc' 2017-01-27 17:16:08 -08:00
be2b27a747 Merge commit '4461ae809043390d5223905cb82b17035c7f9f31' 2017-01-27 17:15:21 -08:00
aec2c8f752 Merge commit 'c45ff2efe64d0face3889194ba6f885fe9cc4d48' 2017-01-27 17:12:13 -08:00
13e34b4679 Fix multiprocessing tests 2017-01-28 01:18:42 +01:00
57373c7c29 Fix docs 2017-01-28 01:16:04 +01:00
79f5bf84e5 [pep8] Potentially breaking docstring changes 2017-01-28 01:15:51 +01:00
3ed720079e [pep8] Fix most remaining lint manually 2017-01-28 01:15:51 +01:00
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
f1d0d73ed7 Fix flaky Sqrt test 2017-01-28 00:45:49 +01:00
9c411513bf Patch distutils crash when linking with ccache 2017-01-28 00:28:33 +01:00
ce78bc898b Fix travis builds and add ccache 2017-01-28 00:28:33 +01:00
887002e932 Add bindings to CUDA tensors and storages in THPP (#615) 2017-01-27 18:15:56 -05:00
eba5299576 Port ROIPool to caffe2 trunk, add CPU implementation
Summary:
Xray is being converted to c2 and ROIPool (needed for detection models) is
missing in c2 trunk. Ported rbgirshick's implementation from experimental with a few
changes:

Also added code for translation in caffe_translate.py

Differential Revision: D4453331

fbshipit-source-id: 7a05a88edec1bd6e806e52dc1e6c55bc75c3149f
2017-01-27 12:59:20 -08:00
22e1bdd6d1 Use stack workspaces in RecurrentNetwork
Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches.

Reviewed By: salexspb

Differential Revision: D4446813

fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656
2017-01-27 11:44:26 -08:00
31dea5ff23 Small typo in README (#613) 2017-01-27 20:18:36 +01:00
ec4602a973 Fix bad code alignment (#612)
forward *is* a method of the Linear class
2017-01-27 20:16:49 +01:00
ed04a20289 distributed reader for evaluation
Summary:
Using multiple readers for model evaluation. Since it is built by new framework, only NativeLoader is supported.

With 5 readers, the evaluation speed is 124k. The speed for single evaluator is 32k. There is still room for improvement since the evaluator machine is under-utilized.
(Hive is the bottleneck. Adding more loading threads help to improve the speed to 240k. More readers can improve it further.)

Reviewed By: azzolini

Differential Revision: D4469393

fbshipit-source-id: b55af5f798faca4c150b2c0663fe5db0f154cb70
2017-01-27 10:44:24 -08:00
319945df15 Test for FC operator + fix for docs
Summary: Test for FC operator + fix for docs

Differential Revision: D4473293

fbshipit-source-id: 6e6ebad007ee08b05184fda288ab74982c6b2219
2017-01-27 10:44:24 -08:00
a38749d15f Fix cuda notes
Target GPU *is* consisten with source GPU
2017-01-27 19:30:49 +01:00
6ee77b4edd Added cunn support for TemporalRowConvolutionMM (#415)
* Added cunn TemporalRowConvolutionMM support
2017-01-27 13:30:25 -05:00
cc65cc64c8 Create function ParseProtobufFromLargeString to parse strings more than 64MB
Summary: Replace ParseFromString with ParseProtobufFromLargeString to get around the limitation of the 64MB limit.

Reviewed By: Yangqing

Differential Revision: D4466226

fbshipit-source-id: b68a6efc76955db294ddb0d23bbaf03b69e4952a
2017-01-27 10:29:22 -08:00
343d65db91 Rowconv repull (#1120)
* Added TemporalRowConvolutionMM layer, tests, and documentation
2017-01-27 13:29:05 -05:00
4c614f2e67 Add ios-cmake 2017-01-27 00:08:57 -08:00
6328981fcf cuda implementation of Gated Linear Unit, fixed issues with genericization 2017-01-26 22:56:33 -08:00
ca1ff1ee9b Add Flatten layer, bugfix in InnerProduct
Summary: Uncovered these while converting xray detection model.

Differential Revision: D4461051

fbshipit-source-id: 1654c0d7ed101c8c211a93aed6bb542db1e20e0a
2017-01-26 21:44:35 -08:00
da01542399 fix third_party symlink 2017-01-26 21:31:18 -08:00
9dd1d9428e Made translator work as command line tool
Summary: Might be useful to have a command line version of this. Thoughts?

Reviewed By: Yangqing

Differential Revision: D4456221

fbshipit-source-id: 42dd464c5734c0cfbd4c2b1cb348aef9b269b4c2
2017-01-26 20:29:35 -08:00
01e860505b Cmake for android
Summary:
Added cmake for android script under scripts, and set up the travis contbuild target.
Closes https://github.com/caffe2/caffe2/pull/109

Reviewed By: bwasti

Differential Revision: D4468767

Pulled By: Yangqing

fbshipit-source-id: 709f3eb6be24727b0a989d0901dbf377871b122a
2017-01-26 18:14:30 -08:00
59d263280e fix directory reference in cmake for inclusion as library
Summary: This fixes build that include caffe2 and change the value of CMAKE_BINARY_DIR to their own binary dir.  Allows the generation of protobuf headers/files in particular.

Reviewed By: Yangqing

Differential Revision: D4466126

fbshipit-source-id: eba264094dd2bff07a7f050b95fd2d5525462b09
2017-01-26 14:44:37 -08:00
a90913105c add make-contiguous in batchnorm backward (#602) 2017-01-26 16:17:39 -05:00
9368596059 legacy.nn Attributes: Add '_gradOutput' to SpatialConvolution. (#600) 2017-01-26 15:00:41 -05:00
864f561525 Make BlobDeserialization throw exceptions instead of returning bool
Summary: Makes it much nicer to spot errors, especially in iPython notebook.

Reviewed By: kennyhorror

Differential Revision: D4465726

fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219
2017-01-26 09:44:19 -08:00
80ed795ff1 Minor ffi utils fix 2017-01-26 11:55:49 +01:00
8bff8014b3 print out inputs in lstm test to catch when it is fluky
Summary:
We get fluky lstm tests on a numerical gradient check. I
would like to improve accuracy of the latter. But first need an
example. After lading this TestWarden would find a bad input for me.

Reviewed By: urikz

Differential Revision: D4467223

fbshipit-source-id: 68d4bf22af11190f39fa28332c6d99efbb192132
2017-01-25 20:59:21 -08:00
b6e330641a fix Android studio compilation error
Summary: Android studio auto -Werrors in debug mode and throws an error on non string literals in 3rd argument of android_log_print

Reviewed By: Yangqing

Differential Revision: D4465263

fbshipit-source-id: af6dc436b7c98a29aa89bb241c452e6da5c8ad1f
2017-01-25 20:29:23 -08:00
de8cd46416 Caffe2 graph to json for visualization in flow
Summary:
- Writing a Caffe2 computation graph to json for visualization in Flow
- Example use in the Text models workflow: it replaces the existing draw function which produces PNG file
- Visualization: https://our.intern.facebook.com/intern/fblearner/c2graphvis/13215753/
- The visualization uses FBLearnerDAG. Plan to add many visualization-related features.

Reviewed By: Mortimerp9

Differential Revision: D4415299

fbshipit-source-id: 2d641d60177566ed2837fb3750394420690f28de
2017-01-25 19:44:20 -08:00
0f870d4f40 Add error checking for too-small input in ConvPoolOpBase
Summary: Fixes segfaults that occur in Eigen and im2col/sgemm backends.

Reviewed By: Yangqing

Differential Revision: D4451772

fbshipit-source-id: 3cf21e5afb2fe300db4228933a82063db5f7091f
2017-01-25 17:44:22 -08:00
9775ffc6ae Fixes to topological sort, canonical blob naming, sharing final blob
Summary: Three small changes:

Reviewed By: ajtulloch

Differential Revision: D4437131

fbshipit-source-id: c849e36e1c4d1dce947076349df863fafe62c66d
2017-01-25 15:14:26 -08:00
a4ba0cceb2 Run memonger to optimize net if needed
Summary: This runs memory optimization on the net.

Differential Revision: D4433788

fbshipit-source-id: 80c3f0568795c2d7a5beb3cdb89a92af91162fef
2017-01-25 15:14:26 -08:00
fa1516d319 Install THCUNN.h and generic/THCUNN.h
The THCApply.cuh is moved to the .cu files so that THCUNN.h can be
compiled by a standard C compiler.
2017-01-25 14:13:17 -08:00
5e26f49db4 Install THNN.h and generic/THNN.h 2017-01-25 14:09:09 -08:00
7694f65120 Revert "Using accreal instead of real in the API" 2017-01-25 16:26:42 -05:00
b5ebf68df1 Revert "Convert real to accreal in libTHCUNN" 2017-01-25 16:13:20 -05:00
40ce50e0bd Speed-up training, fast data-augmentation, sync data_parallel_model changes + other small fixes
Summary:
1. Use opencv for data augmentation after benchmarking various image libraries in python
2. Use cuda no bias conv
3. Use cuda fastest conv (exhaustive search)
4. data_parallel_model had a few changes. Syncing them
3. propagate the errors in threads to make debugging easy

Reviewed By: rbgirshick

Differential Revision: D4341422

fbshipit-source-id: aa4471a2f49dd6d7ca13879999b3c7ceaf818c1e
2017-01-25 11:44:22 -08:00
aed53dd7cf Pass cmd flags of GlobalInit down to workers in Flow
Summary:
It's a similar trick to dyndeps. The idea is that global state is better to be just replicated to gang workers as otherwise it causes a lot of confusion.

In particular it's useful if one wants to enable detailed logging (--v)

For other operators user still needs to call GlobalInit explicitly. We should consider doing it for all Flow operators, but I'll leave it for future considerations.

Reviewed By: kennyhorror

Differential Revision: D4460686

fbshipit-source-id: 5836737dd3195f9ad12589fd899a3ff63f173e05
2017-01-25 11:14:51 -08:00
630d3a5984 Fix blob serialization in KVStore ops
Summary:
Fixes the problem surfaced by D4446583.

Our serialization interface is designed for chunking but recepients in distributed training didn't expect that.

For now I just fixed the naming of the tensor and since our blobs are small it should work.

I believe it's still wrong however for big tensors as we just concatenate the serialized proto strings of chunks here: https://fburl.com/6wayxglz and here: https://fburl.com/7k4nhjja . Deserialization path though just tries to deserialize it as a single proto.

I'll make Blob::Serialize(name) version use non-chunking version in a separate diff. Just sending it to unblock for now.

Side note - oujin - why do we have two versions of operator setting the blob? :) Is one of them added by Pieter? Maybe we should unify them a bit.

Reviewed By: kennyhorror

Differential Revision: D4460974

fbshipit-source-id: 485b4de7c8af8cd9eac44c06a1246deaf0b4d502
2017-01-25 11:14:51 -08:00
65f7c915fd Fix non-chunked Blob::Serialize method
Summary: Previous implementation was just concatenating string which I believe is wrong. Instead let's turn off chunking when we don't ask for it.

Reviewed By: kennyhorror

Differential Revision: D4461311

fbshipit-source-id: 8b9a3325a40a1cd0a8ffeeb20a17bf9f57b7b0a9
2017-01-25 11:14:51 -08:00
2cad802b68 Revert "cuda implementation of Gated Linear Unit" 2017-01-25 13:15:22 -05:00
ddbf90afa3 improve dper dh
Summary:
it's broken because it relies on add sparse bias.
it's not easy to add_sparse_bias after switch to loader_param.

DPA would like to try it out :)

Differential Revision: D4447275

fbshipit-source-id: 631cb4995f35383070e44387dc86692ba64b91eb
2017-01-25 02:59:22 -08:00
0e3146e1e8 Remove recurrent_sizes from RecurrentNetwork
Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps.

Reviewed By: salexspb

Differential Revision: D4427688

fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1
2017-01-24 23:14:25 -08:00
5e5486491d Replace Gather + RowMul by SparseLengthsWeightedSum
Summary:
Improving performace using command SparseLenghtsWeightedSum. Results for my run:
Before:

  8.98474 RowMul
  6.89952 Gather
  0.80991 LengthsSum
  2.02056 SparseLengthsWeightedSum
  Total: 18.71

After:

  1.075 Gather
  6.54999 SparseLengthsWeightedSum
  Total: 7.62

Log of run: P56992396

With skip_backward. Command:

  CLASSPATH=/mnt/vol/gfsetlprocstore-oregon/users/cxj/hivereader-wrapper-1.0-SNAPSHOT-standalone.jar OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 MKL_DYNAMIC=FALSE ./buck-out/gen/caffe2/caffe2/fb/dper/tools/speed_benchmark.par -loader_param /mnt/vol/gfsfblearner-altoona/flow/data/2017-01-22/d832bb7b-5598-422e-9fee-b3299a9c8c1f -negDownsampleRate 0.1 -hidden 'unary(dot{"num_dense": 6, "pooling_method": "PositionWeighted"}(128, 64)128-128, 1)' -model_type mlp_sparse -warmup_runs 10 -main_runs 1000 -run_individual -skip_backward 2>&1 | tee /tmp/log.txt

Before: P56993234$7509
After: P56992503$7344

Command:

  ./fblearner/nn/ads/canary all

https://our.intern.facebook.com/intern/fblearner/details/13320564/?notif_channel=cli

Cloned "caffe2 ads sparse nn canary" run: https://our.intern.facebook.com/intern/fblearner/details/13322337/

Reviewed By: xianjiec

Differential Revision: D4451073

fbshipit-source-id: 0a4e9693d7b8b0372b2efefa61154e987a493210
2017-01-24 20:44:21 -08:00
f0996309d9 Fix Caffe2 gcc 4.8 regex issue
Summary:
It seems that a simple string("") conversion instead of "" is enough.
Closes https://github.com/caffe2/caffe2/pull/105

Differential Revision: D4458626

Pulled By: Yangqing

fbshipit-source-id: 5072499516332ad1067779526523a3f10aade6ef
2017-01-24 19:29:21 -08:00
962b16a814 speedup of softmaxwithlossop
Summary: Speeds up inference in the FCIS model from 2900ms/iter for SoftmaxWithLoss layer to 230ms/iter

Differential Revision: D4456494

fbshipit-source-id: dd520d91fbe950511d198de45f34ac4cd4a676b0
2017-01-24 18:44:28 -08:00
b1472a173a don't hardcode outputs order to work only for lstm + don't pass blob names for parameters
Summary:
In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688)

Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff.

Reviewed By: urikz

Differential Revision: D4444614

fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00
2017-01-24 16:29:23 -08:00
f09da676d7 CNNModelHelper.LSTM test
Summary:
lets have a test for this so we don't break existing usecases
while iterating over RecurrentOp's code

Reviewed By: urikz

Differential Revision: D4456404

fbshipit-source-id: 79f2b88c1eed16106adf5b793b4c74441c7146c6
2017-01-24 15:59:24 -08:00
b7a2a41ceb TensorPrinter helper c++ class
Summary:
it is annoying to print tensors from c++ (while it is easy
from python when you have a net). So I just took logic out of PrintOp
into a separate class

Reviewed By: urikz

Differential Revision: D4452793

fbshipit-source-id: d512559fe07bc468423c9ce38da0c44eaad4fdec
2017-01-24 15:59:23 -08:00
e64b404d45 logging: Join() method for printing vectors
Summary: I can't live without it and we don't have folly here.

Reviewed By: urikz

Differential Revision: D4444511

fbshipit-source-id: 3a85f1a13bd3032be89b3150d40a701dce192004
2017-01-24 11:14:21 -08:00
b39de2cbbe Merge pull request #416 from pavanky/half-fixes
Convert real to accreal in libTHCUNN
2017-01-24 12:17:49 -05:00
49a555e0f5 Merge pull request #1109 from pavanky/api
Using accreal instead of real in the API
2017-01-24 12:17:17 -05:00
c45ff2efe6 Merge pull request #915 from pavanky/convert
Macros to convert between real and accreal
2017-01-24 09:14:33 -05:00
99b520cc5d Merge pull request #421 from huihuifan/cudaGLU
cuda implementation of Gated Linear Unit
2017-01-24 09:13:34 -05:00
200ae58c35 modified save_op for multi-gpu training
Summary: added functions to "de scope" the saved model files

Reviewed By: Yangqing

Differential Revision: D4444966

fbshipit-source-id: f447c15754f8e0648459148fcc7fba410dc06f68
2017-01-23 19:44:20 -08:00
96fc095ccb Add piecewise linear transformation operator
Summary:
New operator is added for model calibration. Given a piecewise linear function and raw prediction as input, generate the mapping as output.
Detail can be find in the operator doc.

Differential Revision: D4418640

fbshipit-source-id: f8ff3ea786b0fe233a4ddcb709e5dbf0861ca484
2017-01-23 17:44:26 -08:00
eb6455d2d9 Remove enforce to have tensor data_ when sharing tensors
Summary: We don't need this enforce since we already allow raw_mutable_data to return nullptr, we should be able to share meta for tensors even without data

Reviewed By: Yangqing, kennyhorror

Differential Revision: D4439138

fbshipit-source-id: 0e81bef3054fe2f9720efd5002418eac7a2b6c08
2017-01-23 14:44:21 -08:00
b5424c9646 Enable top-k accuracy option in caffe_translator
Summary: Caffe2 has a topk accuracy op now

Differential Revision: D4450387

fbshipit-source-id: 2d516cc44fb4e814ca901e73746b0364a0584217
2017-01-23 14:29:24 -08:00
7acdece3b2 Comment out NHWC Alexnet test for now
Summary:
Relies on NHWC implementation of group conv which doesn't exist right
now
Closes https://github.com/caffe2/caffe2/pull/103

Differential Revision: D4451635

Pulled By: Yangqing

fbshipit-source-id: 31d99b37abf7563a26389f47affcc759ce6bc5e1
2017-01-23 13:59:29 -08:00
ceb0c765b9 Make avoid duplicate keys when doing chunking in serialization
Summary: Some DB don't support duplicate keys. Nvidia had problems with LMDB where we potentially can setup duplicate keys. But this won't be possible in some other cases. So instead lets just store different chunks with different keys in DB. And then when reading back we will remove the special suffix.

Reviewed By: dzhulgakov

Differential Revision: D4446583

fbshipit-source-id: 6b345e342840c5fd476029166db131d343467d48
2017-01-23 10:14:18 -08:00
e3ea3e8c12 MKL convolution operator
Summary: Closes https://github.com/caffe2/caffe2/pull/102

Differential Revision: D4448886

Pulled By: Yangqing

fbshipit-source-id: 914d11cd79107895a9755154df3526fcf71a31ea
2017-01-23 09:59:30 -08:00
e0c90de6e6 Speedup get_op_ids_in_path
Summary:
Perf bug report: https://www.facebook.com/groups/1405155842844877/permalink/1617904561570003/

Diagnosis:

I've done some digging into this and here's what I've found:
(1) In this use case, the call is disallowed_op_ids = get_op_ids_in_path(ssa, blob_versions, [], inputs)) where inputs = ['res4_22_sum'] is the last blob produced by the res4 stage of a ResNet101 model.
(2) get_op_ids_in_path has exponential running time in the number of blocks in the res4 stage of ResNet. This is based on empirical running times. This call should complete in 4.5 days on my devgpu.
(3) I haven't familiarized myself enough with the IR and SSA code in core.py to understand the algorithmic fix yet, but surely there's a more efficient algorithm to compute the same thing.

Reviewed By: Yangqing

Differential Revision: D4446278

fbshipit-source-id: 8bd147f92d62b865dc355d5802a53e92d64b6e21
2017-01-23 09:44:26 -08:00
c4b640aeb2 @debug decorator to make it easier to use dropin debugger
Summary:
Now it takes two lines to get drop-in debugger: import it and
then decorate your function. Also got rid of enable / disable logic as
it doesn't seem usefull.

We can also try to enable this by default for our tests when running
locally as a next step.

Reviewed By: bwasti

Differential Revision: D4444299

fbshipit-source-id: 6e2006945d8ad640685b1017ca1bd63054728908
2017-01-23 09:44:26 -08:00
ec51f887bf Create only one instance of SigridTransform in DPerExample.
Summary:
DPer example have been creating multiple copies of the transform config in net
defition till this moment, that resulted in the fact that I've hit the limit of
ProtoBuf (64MB) for a certain Task requests (especially visible because of the
ValidationPipeline that I was adding).

After this diff we're going to store SigridTransforms in one instance per
machine for training (or 1 instance per reading).

Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well).

TODO: Do similar logic for NNPreProc as well (it's also pretty large).

Reviewed By: dzhulgakov

Differential Revision: D4441441

fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047
2017-01-22 19:29:16 -08:00
be1224c0a7 cmake: allow execution of python files without make install
Summary:
This will help issues like #99
Closes https://github.com/caffe2/caffe2/pull/101

Differential Revision: D4448397

Pulled By: Yangqing

fbshipit-source-id: ede3fafc1b1314886583e8ea38948bb31e69347b
2017-01-22 13:29:37 -08:00
c1ba0fbab3 Refactor CuDNNReluOp for multi-precision
Summary:
One way of simplifying the fp16 / multi-precision operators -- remove the explict OpName / OpNameFP16 divide, dispatch the correct calls at runtime based on the contents of the input tensor(s).
Closes https://github.com/caffe2/caffe2/pull/93

Differential Revision: D4444417

Pulled By: Yangqing

fbshipit-source-id: 296dcff1e1e24ba534caca9b82f16e6634da2287
2017-01-22 13:29:37 -08:00
70af31b6c3 Fix ARM_NEON codepath for non-shared scale case.
Summary:
From a new model trained by Zhen. We never exercised this codepath before since we've never had models with this choice before.

I'm auditing all our ARM_NEON codepaths to see if there are other cases like this.

Reviewed By: Yangqing

Differential Revision: D4444694

fbshipit-source-id: e0436db4e8b655551fedb21df160b7cae7e79737
2017-01-20 17:14:33 -08:00
06398e9bfb softmax-with-loss, handle gracefully cases when total weight is 0
Summary:
Spatial Softmax allows specifying locations that are not counted for the loss. If none of the locations are counted, this resulted in NaNs, and headache. This diff fixes that by explicitly handling these cases.

+ assertion for label blob dimension(0)

Created a new test as well.

Differential Revision: D4442939

fbshipit-source-id: 8641bfad2a994e517ca3eda39345380a6ca1ba50
2017-01-20 15:29:21 -08:00
e18643f90b More fixes
Summary:
When testing the code, a couple of issues arised:
 - we need to have different name for last layer than the preprocessed model, otherwise a shape assertion is created
 - preprocess_noaugmentation still needs to do a crop for images larger than 227x227, otherwise things fail.

Reviewed By: viswanathgs

Differential Revision: D4442700

fbshipit-source-id: 05f54e7f17c266280f5ba5bb57af1721fe30df12
2017-01-20 13:44:24 -08:00
6a7dd236fa instance norm
Summary: Added gradient and GPU implementation to caffe2 InstanceNorm op

Reviewed By: Yangqing

Differential Revision: D4304808

fbshipit-source-id: 6feecaed589ea9f825260a49b39b4260da6e5426
2017-01-20 12:29:28 -08:00
a727742644 Prevent concurrent memory and NCCL ops
Summary:
Use a mutex to prevent simultaneous memory alloc / free and NCCL ops
Closes https://github.com/caffe2/caffe2/pull/95

Reviewed By: bwasti

Differential Revision: D4438796

Pulled By: Yangqing

fbshipit-source-id: e5119b4cffcc54f4a2da066d167e93934b302234
2017-01-20 10:59:35 -08:00
3f66f66da9 DebugMode helper for Caffe2
Summary:
It helps to develop scripts locally (when working outside of Flow). One doesn't have to rerun the script in order to catch exception in the debugger / add a print statement. (Flow does this kind of thing automatically)

Usage example:

```
if __name__ == '__main__':
  workspace.GlobalInit(['caffe2', '--caffe2_log_level=2'])
  from caffe2.python.utils import DebugMode
  DebugMode.enable()
  DebugMode.run(main)
```

Reviewed By: Yangqing

Differential Revision: D4424096

fbshipit-source-id: 73f418c80f581820e70139df7e166981e4d8c55f
2017-01-20 09:29:31 -08:00
7179002bfb cuda implementation of Gated Linear Unit 2017-01-19 23:01:30 -08:00
43b5be1d78 added c implementation of GatedLinearUnit 2017-01-19 22:18:08 -08:00
afe822ebd7 Small tweaks
Summary:
Some tweaks, hopefully getting us to 0.98 MAP
- no cropping for test dataset (as per patrick)
- spatialBN momentum 0.1 (default is 0.9)

Also added some additional logging and reduced frequency of running of test net and logging.

Reviewed By: viswanathgs

Differential Revision: D4439790

fbshipit-source-id: 700705b811a5fc8c7139a265de96db646605ca5a
2017-01-19 18:44:26 -08:00
411059d649 Generate huffman tree
Summary:
In this diff :
[1] Change the output from generating all paths from root to labels to TreeProto.
TreeProto itself is required by inference and we can use hsm_util to get the
paths from TreeProto.

[2] Fix hsm_util index assigment.

Differential Revision: D4416731

fbshipit-source-id: 657d8b9b4df6fa30c9f92d391cf7e07b5c5db1f8
2017-01-19 16:14:23 -08:00
9c2067cc49 tweaked CUDNN_BN_MIN_EPSILON comparison to eliminate runtime warning
Summary:
CudnnSpatialBNOp was generating a runtime warning when testing (epsilon_ < CUDNN_BN_MIN_EPSILON) even though epsilon is set = to CUDNN_BN_MIN_EPSILON by default.
Tweaked the comparison here to allow for a small epsilon. I implemented the softer comparison by introducing FLT_EPSILON from <float.h> - let me know if there is a
preferable set of constants to use here.

Reviewed By: Yangqing

Differential Revision: D4431766

fbshipit-source-id: 5e67690a5ed258d460d95e9582b6fdf2050b42f9
2017-01-19 15:44:25 -08:00
70459202da Install tests
Summary:
Install tests if they're built
Closes https://github.com/caffe2/caffe2/pull/94

Reviewed By: bwasti

Differential Revision: D4437950

Pulled By: Yangqing

fbshipit-source-id: 21204bd25a93a47e3e66378251e55fcfae6af7cf
2017-01-19 15:14:53 -08:00
dd51336611 Fix label start index for HuffmanTreeHierarchyOp
Summary: Change labels indices range to be in the range [0, num_classes[

Differential Revision: D4416685

fbshipit-source-id: b16ca8539fd538ad62bf1298dbad3f1553956241
2017-01-19 15:14:53 -08:00
9f0a7935f6 Replace one more place from _net.external_input to _external_input_map
Summary: #accept2ship

Reviewed By: dzhulgakov

Differential Revision: D4435301

fbshipit-source-id: 6b62492c190325e82bc14d5397852106d07d5235
2017-01-19 12:29:30 -08:00
8ed9a91d77 Avoid PrefetchOp destructor assertion when not necessary
Summary:
Countless hours were spent debugging why ImageInputOp failed with a cryptic exception P56967302. Turns out, that assertion happened in PrefetchOp destructor, that was triggered when a assertion failed in ImageInputOp constructor. Because of this, the underlying problem was shadowed. I fixed this by not asserting on finalize_ if there is no prefetch thread running, and now the error is clean:

[enforce fail at image_input_op.h:105] scale_ > 0. -1 vs 0. Must provide the scaling factor.

Reviewed By: Yangqing

Differential Revision: D4435105

fbshipit-source-id: 52f85a9fd30eea396c9faca54b6d946fa847b7ff
2017-01-19 08:29:22 -08:00
16d2f3b44c reshape explicitly in-place
Summary: TSIA

Reviewed By: salexspb

Differential Revision: D4434558

fbshipit-source-id: 025d70db82d4710f66008086833cf833c2344401
2017-01-19 01:59:25 -08:00
91ebfa3c7c Unit test for big batch size avg pooling
Summary: basically copied test_pooling and hard coded values

Reviewed By: prigoyal

Differential Revision: D4428162

fbshipit-source-id: 6c0444ac8c21f08824df7ff53999a94967607dc4
2017-01-18 19:29:20 -08:00
be97f491e6 Unbreak caffe_translator for Conv op
Summary:
Minor bug in D4426513 - bias is added
as input blob always. Running it on xray throws "RuntimeError: [enforce fail at operator.cc:25] blob
!= nullptr. op Conv: Encountered a non-existing input blob:
caffe.SpatialConvolution_0_b"

Reviewed By: Yangqing

Differential Revision: D4429231

fbshipit-source-id: 0d3905ea6e87128ec1aa9d0f0a2f43126b1069b1
2017-01-18 14:00:04 -08:00
e67425647a Support bias for Scale layer in caffe_translate
Summary:
Turns out xray models have some independent Scale layers (with bias) besides
the Conv-Scale pairs. We could still fuse it with previous layers with some
work, but for simplicity, including Add op followed by Mul for bias if needed.
We could revisit optimizations layer fusion in the future once we have
something working for xray.

Reviewed By: Yangqing

Differential Revision: D4427266

fbshipit-source-id: ef7d8677ccd7d10dbd20759eeed378d9bc4522d1
2017-01-18 09:59:21 -08:00
bfca2b86c3 Removed the old group convolution code
Summary: Now that we direct support group convolution, this will no longer be needed. I also took the chance to add dilated convolution and also optional bias.

Reviewed By: prigoyal

Differential Revision: D4426513

fbshipit-source-id: eb2bb0aa619f8ff5f732512570f736bc59cd57dd
2017-01-18 00:44:31 -08:00
e23ddf06e9 UnsafeCoalesceOp for nn.Module.flattenParameters style coalescing
Summary:
This is a handy tool for amortizing expensive operators (e.g.
distributed communication, some heavier kernel launches, etc) over a
lot of small blobs (e.g. all the biases in a network). We can just
coalesce these small blobs in-place into a single blob, act on them in
operators, etc as if they are non-coalsed (passing them as inputs to
operators, etc), and then finally for heavier operators, just work on
the coalesced blob that contains each of these units.

I named it UnsafeCoalesce since it introduces blob aliasing, which
needs care for work like memory management, graph rewriting as in
memonger, etc.

Reviewed By: Yangqing

Differential Revision: D3557149

fbshipit-source-id: 09cff4459b84270fe9e1da3b4a168fd66d01f795
2017-01-17 17:14:35 -08:00
b5f6fdb814 Using accreal instead of real in the API
This is done to be consistent with the changes made to cunn
2017-01-17 16:58:19 -08:00
a69d819901 Converting all instances of real to accreal in libTHCUNN
This is because the current version of luaffifb fails to pass
custom structs (i.e. half) as arguments or accept them as return
values.

The accreal parameters are immediately converted to real internally.
This is done to ensure none of the internal code needs to be changed.

This change also removes transform_reals_to_half which is no longer
necessary.

Change-Id: I978151d001de5492576fb0eddfa0608cd4e99149
2017-01-17 16:06:42 -08:00
fef2b1526d Adding macros to convert between real and accreal 2017-01-17 15:14:45 -08:00
3719994c96 Remove redundant code in THGenerateAllTypes.h 2017-01-17 15:12:43 -08:00
204867a884 in lite mode, return the non-readable string, better than nothing.
Summary: TSIA

Reviewed By: bwasti

Differential Revision: D4379950

fbshipit-source-id: 8a5d0b5454c2f1b874526f4393c4b575966bc889
2017-01-17 11:59:30 -08:00
d63f58013b Throw error in caffe_translator on Scale layer with bias
Summary: Failing fast instead of swallowing the bias term.

Differential Revision: D4419130

fbshipit-source-id: 98ce0af9a20adecfb027ffe8293ff69910873abc
2017-01-17 09:59:20 -08:00
7d6742f2f5 Tool to convert caffe models to c2 + fixes for xray v10
Summary:
Simple tool similar to caffe_translator_test.py for conversion from caffe to
caffe2. The differences are:

There are a couple of issues that need to be fixed as mentioned in
https://our.intern.facebook.com/intern/tasks?t=15424761, especially related to
the 'legacy_pad' field in conv op.

Differential Revision: D4407146

fbshipit-source-id: ec641f6d7e0cf6cdf2eca21f058b4451635d4a56
2017-01-17 08:59:58 -08:00
4461ae8090 include cstddef for msvc 2017-01-15 23:45:48 +08:00
2b948c42cd Add SpatialAdaptiveAveragePooling. 2017-01-14 19:44:07 -06:00
b2ae054410 Add SpatialAdaptiveAveragePooling. 2017-01-14 15:27:52 -06:00
b96c2ed6ab fix validation to consider cpu-only ops
Summary: Data paralell model has a sanity check that ensures that operators inputs/outputs do not cross device boundaries. This failed when the operator was a CPU-only operator (such as the new AccuracyOp version). This fixes that.

Reviewed By: prigoyal

Differential Revision: D4417841

fbshipit-source-id: 9bc4e7a2074a544ca4db69ecf24183bbd41f84ca
2017-01-13 18:59:32 -08:00
8683737410 Caffe translator: match torch pooling
Summary: See code comments: legacy is a legend.

Reviewed By: viswanathgs

Differential Revision: D4414447

fbshipit-source-id: 7cd96778bbc00aff053100871f273b2e1b43c973
2017-01-13 10:59:20 -08:00
9ad10959ee Enable large PlanDef protobuf message.
Summary:
Enable cases where PlanDef message is bigger than protobuf string decoding
limits.

Differential Revision: D4412736

fbshipit-source-id: 91ee02d7a8ab85b1c8169683a6c1dccd4c79be40
2017-01-13 09:29:29 -08:00
d9c9404885 refactor to allow for parallel gpu execution
Summary:
First step in doing multi GPU training - modification of training code to use ImageInputOp. A few changes to accomplish that:

+ modified script that generates our lmdb to store byte image data instead of float
+ we have a float 'label' for our regression problem so added support for float labels in ImageInputOp
+ updated train_network.py to use ImageInputOp, but it is still single GPU

Reviewed By: seansnyder

Differential Revision: D4407728

fbshipit-source-id: a59a1b91b69a9d5f0486383d4fb0a993478393c9
2017-01-12 15:14:50 -08:00
0d5f3654b2 Adding back untracked files from manual github pull
Summary: Github import didn't work and the manual import lost some files.

Reviewed By: Yangqing

Differential Revision: D4408509

fbshipit-source-id: ec8edb8c02876410f0ef212bde6847a7ba327fe4
2017-01-12 08:59:19 -08:00
048be4533d Fix autogenerated docs.
Summary:
It looks like markdown is not happy for lines starting with =. This diff is
just simply fixes 2 cases when it was not true.

Reviewed By: dzhulgakov

Differential Revision: D4409033

fbshipit-source-id: f2ba3ce5e3936a1e0d57984c12234209993550be
2017-01-12 03:29:18 -08:00
3a514fe28d gpu transform fix
Summary: Removed the no longer needed line.

Differential Revision: D4403219

fbshipit-source-id: 5a4b9cbb6c9ab5afa3b973baae9505e170b83da3
2017-01-11 22:44:20 -08:00
f0c893dcb8 ShareExternalPointer with meta
Summary: TSIA - for background, see D3557149

Reviewed By: ajtulloch

Differential Revision: D4405095

fbshipit-source-id: ea74749e3deacee74ac89e38bf6c47e340be3c92
2017-01-11 22:29:25 -08:00
1cd166d330 CMake completions work
Summary: Closes https://github.com/caffe2/caffe2/pull/88

Differential Revision: D4404292

Pulled By: bwasti

fbshipit-source-id: 8a4351c2dee5136aaa12b90f1a61fd7afee51994
2017-01-11 16:59:22 -08:00
d8314bf278 Fix ordering of TransformOnGPU arguments
Summary:
Swap std, mean to match actual interface
Closes https://github.com/caffe2/caffe2/pull/86

Reviewed By: Yangqing

Differential Revision: D4387679

Pulled By: bwasti

fbshipit-source-id: 54020af2398240f79ee6bb0c1f6b01ab58287353
2017-01-11 11:44:38 -08:00
4de888e167 Add optional gradient on weights for (Sparse)LengthsWeightedSum
Summary:
It ended up much messier than originally expected. Maybe we should have just hardcode it, but I've tried to be "generic" so far at expense of code readability.

The main issue is that for weights computation we need access to original embedding matrix and in sparse case we need to relookup the embeddings to do the dot product with output grads.

Thus I'm making weight grad computation optional, controlled by a flag and it triggers invocation of a different backward op that produces both grads at the same time.

So far it's implemented only for 'Lengths' version. It'd be straightforward to implement (Un)SortedSegment versions but I haven't done that yet.

Reviewed By: kennyhorror

Differential Revision: D4388215

fbshipit-source-id: 23132ab7daa1f5eec49233f802af1fe75b469c2b
2017-01-11 11:44:38 -08:00
4ae5235ec9 Tiny clean up of reducer_functors
Summary: Just to make life a bit easier to further work.

Reviewed By: kennyhorror

Differential Revision: D4388071

fbshipit-source-id: 71b99ef1c2dc680afe4e9ef2f7a370e43116ce99
2017-01-11 11:44:38 -08:00
0e6ebdf50a Speed up travis slightly and fix documentation mistake
Summary: Closes https://github.com/caffe2/caffe2/pull/90

Differential Revision: D4404418

Pulled By: bwasti

fbshipit-source-id: a45af5624eff12abbb103f1e55d2906d35e0dee5
2017-01-11 10:44:27 -08:00
92ebb58a06 Top-k accuracy operator on host
Summary:
Automatically copy from device -> host if necessary.

Thanks to pooyadavoodi for the host top-k code.
Closes https://github.com/caffe2/caffe2/pull/51

Reviewed By: Yangqing

Differential Revision: D4348953

Pulled By: bwasti

fbshipit-source-id: be650855cdd6c2c7bed838155f30e9fa92759dfe
2017-01-10 18:44:30 -08:00
8047b8dc83 Fix random issues with some of the layers getting missing from registry.
Summary:
It looks like for the types that are created directly through type(...)
function call, we don't store the strong references anywhere. As a result
a GC call in Python might/or might not clean up these classes depending on the
phase of the moon and other random things. This results in a fact that in some
cases simple layers as a Relu might disappear.

cat_shame

Reviewed By: xianjiec

Differential Revision: D4396289

fbshipit-source-id: ba4e9b7ef54ee43349853b0acc3d3f40c74e4d73
2017-01-10 15:14:31 -08:00
bb928f3cc0 Latest fixes to Xray Flow workflows for Caffe2
Summary:
(Ignore the convolution-op related changes, they will be later patched separately)

This diff ignores work from latest few weeks:
- some refactoring of the flow ops
- no_bias setting
- MAP computation (instead of accuracy) for OC
- adaptive learning rate for Xray concepts
- various small bug fixes

Reviewed By: viswanathgs

Differential Revision: D4329500

fbshipit-source-id: 000d4fd22ec408af5290480c788eb86546bff52e
2017-01-10 12:59:23 -08:00
4f1db36cff add CUDA gradient for Div
Summary: DivOp missed a gradient for CUDA, so implemented it. Also added operator test.

Differential Revision: D4396638

fbshipit-source-id: 9949e47aa3735bb418a0db003e2b2f4896056a71
2017-01-09 21:59:23 -08:00
95b3309a87 Gradient Input memory sharing using memonger blob sharing
Summary:
This diff brings us to roughly par with Torch on ResNet memory usage. On batch size 32, Resnet-50 took 7497MiB, after this 5010 MiB. This will thus allow us to handle 64 images / GPU, or 256 images / 4 GPUs.

In addition, I added a special argument to DagNet that causes it to run only one thread for the first iteration. This is needed since there are allocations on the first iteration's backward pass due to gradient sharing, and this will cause NCCL to deadlock.

The sharing of gradient buffers requires inferring which gradients can share memory (i.e that they are not used concurrently). Previous memonger code uses topological sort, but rbgirshick showed that it does not work with tree-like models. Thus, I wrote a new optimization algorithm based on DFS. It takes about 0.25 secs / GPU on resnet-50, so is clearly fast enough.

Module data_parallel_model supports this feature natively.

Reviewed By: prigoyal

Differential Revision: D4363209

fbshipit-source-id: 73b11e7610438098bb11bff0af8075ab0cf2c0f1
2017-01-09 19:44:23 -08:00
3732a0044c Move mpi_python.cc to the python folder to be more consistent about source file locations.
Summary: TSIA

Differential Revision: D4386553

fbshipit-source-id: 2c7196171be7d0af90b46b75f68c949ee3980c2e
2017-01-09 10:59:39 -08:00
b99ea43c9a Set default build type to release
Summary: Closes https://github.com/caffe2/caffe2/pull/89

Reviewed By: bwasti

Differential Revision: D4392954

Pulled By: Yangqing

fbshipit-source-id: 00ec72838e5e7dd9ff96449a8589273c68d0cef5
2017-01-09 10:59:39 -08:00
73fe3d5f59 Update travis to test more versions of GCC and fix README build status link
Summary: Closes https://github.com/caffe2/caffe2/pull/87

Reviewed By: Yangqing

Differential Revision: D4387686

Pulled By: bwasti

fbshipit-source-id: 068ab542bbbd793cbabd06cd77c95ce13ebaf012
2017-01-08 21:29:35 -08:00
737000b166 Linter fix up to sync fbsource and github 2017-01-06 15:36:17 -08:00
3833dad5f6 manual sync of old never sync'd files 2017-01-06 15:28:45 -08:00
46c6e621cb Fix warning in ScaleOp grad
Summary: #accept2ship

Reviewed By: Yangqing

Differential Revision: D4386362

fbshipit-source-id: 634410e73034ac31b7f2bec39f41c52ea9935e3a
2017-01-06 00:44:33 -08:00
76c9382fb3 Delete caffe.cloc 2017-01-05 13:35:45 -08:00
603784c8cb fix typo 2017-01-05 10:48:26 -08:00
dd133edf84 Update README.md 2017-01-05 10:15:27 -08:00
02436b0982 Merge pull request #80 from caffe2/cmake
Merge cmake branch into master
2017-01-05 10:10:12 -08:00
74d6004f1d Merge branch 'master' into cmake 2017-01-05 10:10:04 -08:00
b1a003627f Merge pull request #81 from bwasti/master
adding license back
2017-01-05 10:08:51 -08:00
c1e6aa58a0 adding license back 2017-01-05 10:08:17 -08:00
32ec21c0d1 Merge pull request #79 from bwasti/master
Small fixes + docs
2017-01-05 10:00:50 -08:00
a358ed4297 Update docs to reflect current build status 2017-01-05 09:59:31 -08:00
69ce8cafde Don't add levelDB dependency unless Snappy is also present 2017-01-05 09:55:09 -08:00
c0a48638ee Merge pull request #12 from caffe2/cmake
Cmake
2017-01-05 09:52:57 -08:00
10bad040c2 Merge branch 'master' into cmake 2017-01-05 09:52:45 -08:00
ac03e65929 Move c++11 check to cmake 2.8
Previous check required cmake >= 3.1
2017-01-05 12:15:54 -05:00
e126f6e960 travis cache apt 2017-01-04 22:30:19 -08:00
78fb184cef mac travis: use Eigen instead of openblas 2017-01-04 22:17:18 -08:00
1a26aab1cf Seems that on mac, the inclusion order matters. 2017-01-04 21:52:59 -08:00
83b2f282de Need to set c++11 before check_cxx_source_compiles 2017-01-04 21:38:24 -08:00
375c0816b3 goodbye old brewery 2017-01-04 20:58:35 -08:00
46a403250f Make build for Android a bit easier 2017-01-04 20:50:06 -08:00
7734235a6a Add misc check for the long type, and temporarily disabled
core_overhead_benchmark to remove the benchmark dependency for all
binaries
2017-01-04 20:46:19 -08:00
1e8659fd89 build files bugfix 2017-01-04 20:36:11 -08:00
1be71804c8 For the caffe and caffe2 protobufs, compile them to static instead of shared. 2017-01-04 17:36:03 -08:00
a9e2693fa8 add back third_party/protobuf, but it won't be used in normal builds.
Pinned protobuf to v3.1.0

Removed the USE_SYSTEM_PROTOBUF option in cmake. It is no longer used.
2017-01-04 17:27:18 -08:00
9d42eca92e delete no longer used cmake lists under third party 2017-01-04 17:03:54 -08:00
b31708fb6e Added summary to end of CMake configuration 2017-01-04 16:44:55 -08:00
347e17600f Added option BUILD_PYTHON 2017-01-04 16:44:06 -08:00
3d1bda1f3a cmake: make python dependencies separate from the C++ dependencies 2017-01-04 16:34:56 -08:00
610df2059e Rephrase warning for missing dependency 2017-01-04 15:48:19 -08:00
249e1857e2 Reset and warn when any options are not satisfied 2017-01-04 15:46:43 -08:00
41e03c9c38 cmake file fixes 2017-01-04 14:52:15 -08:00
711b457681 Merge pull request #11 from caffe2/cmake
Cmake
2017-01-04 14:46:35 -08:00
5bfd6c4cd1 semicolon 2017-01-04 14:36:16 -08:00
311ae2ba33 build file fix and avx2 on mac fix 2017-01-04 14:35:15 -08:00
ec289099b7 Merge pull request #10 from caffe2/cmake
Cmake
2017-01-04 14:22:36 -08:00
1be46aeb21 more gitignore from caffe 2017-01-04 14:21:39 -08:00
3534a0ef76 Merge pull request #77 from bwasti/master
moved exclude to append in binary sources
2017-01-04 14:21:14 -08:00
cd617c3a76 moved exclude to append in binary sources 2017-01-04 14:20:22 -08:00
9f351d581e Add build/ to .gitignore since that's common practice for cmake 2017-01-04 14:12:55 -08:00
1259172420 Merge pull request #76 from bwasti/master
Moved binaries/python CMake files to reflect paradigm of the rest of the codebase
2017-01-04 14:11:44 -08:00
62265cd1eb Remove unnecessary cmake lines 2017-01-04 14:10:54 -08:00
3e4b24447b Add a missing if opencv found check 2017-01-04 14:09:44 -08:00
580294cdd4 remove accidentally included old version of installation instructions 2017-01-04 14:04:17 -08:00
2f3b5d7943 Moved binaries/python CMake files to reflect paradigm of the rest of the codebase 2017-01-04 14:02:52 -08:00
e80f4430c4 clean no longer needed cmake lines 2017-01-04 13:44:27 -08:00
37b5af990a Changes to make MKL operators build. 2017-01-04 13:37:34 -08:00
945cc8dd13 Merge pull request #75 from bwasti/master
Fix accidental inclusion of cudnn tests in CPU tests
2017-01-04 13:36:28 -08:00
82070ebd7a Fix accidental inclusion of cudnn tests in CPU tests 2017-01-04 13:33:04 -08:00
ccdeede31b mkl: GLOB_RECURSE instead of GLOB 2017-01-04 13:21:35 -08:00
dc274f9d74 Merge branch 'cmake' of https://github.com/caffe2/caffe2 into cmake 2017-01-04 15:08:33 -05:00
a69ed110f8 Merge pull request #9 from caffe2/cmake
Cmake
2017-01-04 12:06:53 -08:00
c52e744cba Merge branch 'master' into cmake 2017-01-04 12:06:39 -08:00
ae62e15f87 Added MPI operators to cmake 2017-01-04 15:06:20 -05:00
7ea9f9e0ee Updated naming convention of Caffe2_LINK* 2017-01-04 12:03:27 -08:00
05fa16a7aa Add contrib/nccl to cmake 2017-01-04 14:36:02 -05:00
425ce989e2 Update README.md 2017-01-04 11:25:17 -08:00
5070321915 Add cuda_rtc to new cmake layout 2017-01-04 14:19:48 -05:00
ff9a35ce96 Merge pull request #74 from bwasti/master
Migrate brewtool stuff into brewtool/ and update makefile to use cmake
2017-01-04 11:15:57 -08:00
07a1c58cad Remove branch specification in travis 2017-01-04 11:14:47 -08:00
3dbddf6104 Create .travis.yml 2017-01-04 11:14:18 -08:00
1d03be77d0 mkl cmake file, not tested 2017-01-04 11:13:12 -08:00
5142640b2b Added all folders to the add_subfolders section, with the ones not ready being commented right now. 2017-01-04 10:58:10 -08:00
3f432a8d43 Migrate brewtool stuff into brewtool/ and update makefile to use cmake 2017-01-04 10:56:15 -08:00
358d72aa29 Merge pull request #72 from bwasti/master
Merging updates to the build system into main repo
2017-01-04 10:51:36 -08:00
ae17168939 Ensure glob always happens 2017-01-04 10:46:37 -08:00
ec87d49f56 Merge branch 'master' of github.com:bwasti/caffe2 2017-01-04 10:46:13 -08:00
d88d706446 Removed protobuf from third_party 2017-01-04 10:46:00 -08:00
8396207684 CMakeLists for db, queue, sgd 2017-01-04 10:45:20 -08:00
9fdc844620 halfway into going towards individual-folder cmake lists 2017-01-04 10:29:57 -08:00
1395c1701e Revert relabeled 'build' directory for protobuf compilation 2017-01-04 10:02:43 -08:00
8be4bfb424 Merge pull request #8 from caffe2/master
bounds check in Gather operation
2017-01-04 10:00:39 -08:00
cc8b6bf715 USE_OPENMP option added
Factor out omp pragma parallel for into a central macro
2017-01-04 12:44:47 -05:00
fb43912616 Guard new cmake feature with version detection for compatibility 2017-01-04 09:43:25 -08:00
1161b34529 Merge branch 'master' of https://github.com/bwasti/caffe2 2017-01-04 09:41:06 -08:00
76cbf1d4d1 Reducing minimum version of cmake required 2017-01-04 09:40:56 -08:00
9748c92b75 Factor out DB source collection
Should handle all combos of present / missing DBs
2017-01-04 11:05:13 -05:00
65641b6bfb bounds check in Gather operation
Summary: Currently Gather doesn't check if the provided indices are in the correct range. Adding a check makes issues easier to debug

Reviewed By: dzhulgakov

Differential Revision: D4277170

fbshipit-source-id: dc744b6a229aaf72af8336a417f0f79c97dbdc77
2017-01-04 01:14:25 -08:00
c226211b87 Merge pull request #7 from caffe2/cmake
Cmake
2017-01-04 00:53:08 -08:00
4d53c632e0 Remove unnecessary cuda flags.
-Xcompiler -std=c++11 is not needed, otherwise gcc produces warnings.
2017-01-04 00:17:35 -08:00
69c09e1c48 BLAS option: Atlas->ATLAS, and added an else() message guard. 2017-01-03 23:37:07 -08:00
6c124c6f49 Allow glog and gflags to be optionally used.
If USE_GLOG and USE_GFLAGS are set to off, or if the system does not
have glog and gflags installed, caffe2 will fall back to a non-glog
and non-gflags installation. This would be helpful for e.g. mobile
builds.
2017-01-03 23:16:50 -08:00
324ef09e01 fix typo 2017-01-03 23:16:36 -08:00
c63500fe68 remove explicit glog and gflags link libraries, since the caffe2 dependencies would have already had them. 2017-01-03 23:15:02 -08:00
6bf2e156d4 cmake cuda: add libcuda.so find paths, and produce error if it is not found. 2017-01-03 23:14:07 -08:00
628a6b17d3 Merge remote-tracking branch 'upstream/master' into cmake 2017-01-03 21:11:29 -08:00
52784a3a21 Add LOG_IF and VLOG_IF to the non glog option.
Summary: TSIA - this is needed when users choose to build without glog.

Reviewed By: bwasti

Differential Revision: D4380186

fbshipit-source-id: 1803d451e296f3af5258e0d67d4afdec5f5e5623
2017-01-03 20:59:19 -08:00
b1a31942fc Merge remote-tracking branch 'upstream/master' into cmake 2017-01-03 18:10:58 -08:00
4c51f96b9d DEFINE -> CAFFE2_DEFINE
Summary: This is needed to properly compile when gflags is not present.

Reviewed By: bwasti

Differential Revision: D4379796

fbshipit-source-id: 3344fa304d85feabbdba81449f663405ed731797
2017-01-03 17:59:35 -08:00
7c3f1521a7 Gpu transform
Summary:
Adds a thread pool for image decode, and optional GPU-based data conversion, mean subtraction and std division
Closes https://github.com/caffe2/caffe2/pull/56

Reviewed By: Yangqing

Differential Revision: D4341326

Pulled By: bwasti

fbshipit-source-id: 6485616ea7d212c7701274a40fae912db30dff4a
2017-01-03 17:59:34 -08:00
6618d7462d Improvements+fixes for NetBuilder
Summary: Title.

Reviewed By: dzhulgakov

Differential Revision: D4358227

fbshipit-source-id: 21afe5107bed27eec2027f16f2c77db62c70c6e8
2017-01-03 16:59:24 -08:00
ed5b349cd9 Merge t push
branch 'master' of github.com:bwasti/caffe2
2017-01-03 15:27:01 -08:00
eab5cef032 Removed redundant fpic invocation 2017-01-03 15:26:52 -08:00
4ab0351647 Merge pull request #6 from caffe2/cmake
Cmake
2017-01-03 15:19:58 -08:00
16aacbdf83 Fix MSRAFill op
Summary:
While debugging resnets on imagenet, Ross pointed that MSRAFill is not done correctly. Fixing that
1. use fan_out not fan_in
2. Normal distribution rather than uniform

Reviewed By: Yangqing

Differential Revision: D4372380

fbshipit-source-id: 8f03bd75f543caa60c20e841edbdbb918d1c8775
2017-01-03 14:44:27 -08:00
737f507786 Fix all instances of 'build' folder being used to prevent errors on make 2017-01-03 14:30:13 -08:00
86a81b3df2 Merge branch 'master' of github.com:bwasti/caffe2 2017-01-03 14:27:21 -08:00
4698062fcc use different name for build folder (conflicts with build file) 2017-01-03 14:27:08 -08:00
0ce23319c2 Change default blas to Eigen 2017-01-03 13:57:26 -08:00
90f601e4cf Checked out older (and still working) version of pybind11 2017-01-03 11:57:48 -08:00
67a74f3ada no fancy auto in lambda functions.
Summary:
This is needed so that we stick with C++11 instead of 14, which are not well
supported in a few platforms.

Reviewed By: bwasti

Differential Revision: D4377534

fbshipit-source-id: d65d7caaa935a8f16e3b44c838104a576c8f78e4
2017-01-03 10:59:27 -08:00
a84fa6fb98 Checked out older (and still working) version of pybind11 2017-01-03 10:52:41 -08:00
6303dab3ab Update README.md 2017-01-03 10:48:57 -08:00
55856c720c no sudo on pip for ubuntu 2017-01-03 09:56:02 -08:00
52b0741f78 Merge pull request #71 from bwasti/master
Fix GPU compilation/False positive Clang warnings
2017-01-03 08:49:03 -08:00
b8df7ce149 fbcode: remove unused includes from .cpp files without #if (but possibly #define)
Summary: Same as D4312617 but this time not excluding source files with `#define`.

Reviewed By: soumith

Differential Revision: D4344811

fbshipit-source-id: 5a314960c319f029c6737c8c8ac8224ec2f20218
2017-01-02 05:29:17 -08:00
3f270f60ce display spans for a certain time interval
Summary:
This diff adds a couple of options to `htrace_to_chrome.py` so that users can specify start and end timestamps for displaying spans.
For example, the arguments `--start_time x  --end_time y` indicate that spans that finish before `y` or start after `x` will not be included in the final chrome tracing json file.

This also adds timestamp information to the spans which can serve as hints to the command line argument values.

Differential Revision: D4372220

fbshipit-source-id: a2b0af3be6861448874d804b30426df1b67a676e
2016-12-31 10:29:24 -08:00
76a2c9cbf7 Attempt to get numpy working with travis 2016-12-29 17:29:41 -05:00
415f4959ce Attempt to get numpy working with travis 2016-12-29 17:28:36 -05:00
965228e559 specify python version in travis 2016-12-29 17:10:41 -05:00
187ba9d969 Added alternative numpy installation option 2016-12-29 17:00:50 -05:00
e5793efb09 clean up enviroment variables 2016-12-29 16:44:51 -05:00
edac248dad no prompt for addition of extra repository 2016-12-29 16:31:29 -05:00
1aa473638d Added a search path to find OpenBLAS for convenience (homebrew install) 2016-12-29 16:15:25 -05:00
bd2346093f added a build script to specify openblas with OS X 2016-12-29 16:02:22 -05:00
06265daf1d Add test repositories to travis 2016-12-29 15:45:33 -05:00
70b7f6af23 Fix leveldb brew install typo 2016-12-29 15:35:12 -05:00
2e2522bf30 Moved addons back into the matrix specification 2016-12-29 15:34:26 -05:00
f89340502c set distro earlier on in the travis configuration file 2016-12-29 15:27:24 -05:00
fc750ae32d remove gtest installation attempt 2016-12-29 15:26:13 -05:00
056312a538 Specify trusty distro 2016-12-29 15:19:18 -05:00
5abc094ea1 specify location of openblas and added addons 2016-12-29 15:16:15 -05:00
5a77c59d81 added OS X to travis and split install script into separate file 2016-12-29 15:02:44 -05:00
9ce23cbb71 Fix false positive for non-clang compilers. 2016-12-29 11:39:50 -08:00
454d439cdd Add back Caffe2_GPU to Caffe2_LINK variable if it can be enabled 2016-12-29 11:37:12 -08:00
ea610da033 Merge pull request #70 from bwasti/master
Build passing on travis (ubuntu trusty)
2016-12-29 14:22:33 -05:00
3dbcae9ef0 Fix typo breaking NumPy includes 2016-12-29 13:51:26 -05:00
b097c993e0 Merge branch 'master' of github.com:bwasti/caffe2 2016-12-29 12:41:29 -05:00
3ebb52074f Fix duplicate definition bug (only present in GCC) 2016-12-29 12:38:38 -05:00
d515d8ffb8 Update README.md 2016-12-29 12:35:47 -05:00
b48f1ff810 OS X build 2016-12-29 12:25:53 -05:00
a2ae00519c add speed benchmark tool
Summary: provide a easy way to benchmark different dper models.

Differential Revision: D4367258

fbshipit-source-id: 4821645c58ad183becf0c82daae991375d5c6ef4
2016-12-28 14:14:25 -08:00
5251bb12c2 Merge pull request #66 from caffe2/eigen
Updated eigen submodule
2016-12-28 15:21:02 -05:00
ce02932517 added documentation 2016-12-28 14:48:51 -05:00
3adca70cec bugfix htrace_to_chrome wrong output file name
Summary:
This is a quick bugfix on `htrace_to_chrome.py`, which produces outputs with wrong file names if command line arguments are given in a specific way.

  fbcode $ python caffe2/caffe2/contrib/prof/htrace_to_chrome.py --display operator /tmp/htrace_alexnet_span_log_20161224_055901
  Writing chrome json file to --display.json
  Now import --display.json in chrome://tracing

Differential Revision: D4369445

fbshipit-source-id: 628f4dbd88fb86814a0d92cd4c8407ba12a401d0
2016-12-28 10:14:30 -08:00
4b3bd06a7f sparse nn converges better by dedupping sparse gradient by mean
Summary:
this normalizes the sparse gradient, so that the "effective learning rate" of each sparse parameter will NOT be affected by the number of examples in a batch that "use" this sparse parameter.

experiment shows it help convergence (about 0.1% better train NE): https://fburl.com/1230747813683956. It's not conclusive yet, and we still need to do more experiments. But this diff adds it as an option, and does not change the default behavior, so we can get this in first.

Differential Revision: D4367283

fbshipit-source-id: 49ea80dfa9ea776ff4160e220cf6c86593521607
2016-12-27 22:59:29 -08:00
8576eef831 added link_directories to hopefully fix travis issue 2016-12-27 13:58:16 -08:00
6d6d418f6c remove policy (doesn't work with older versions of cmake) 2016-12-27 12:34:55 -08:00
db745f33a5 whoops had the old command for cmake in there 2016-12-27 12:19:43 -08:00
cf64d91548 new policy for building shared libs 2016-12-27 12:12:04 -08:00
1307e6f1cf compatibility with older version of CMake 2016-12-27 12:10:16 -08:00
244f6aed28 force use of new cmake 2016-12-27 11:51:07 -08:00
9e75aa4d35 specify path to write htrace logs
Summary: This diff adds a gflag for specifying the path for htrace span log files. This flag is used by the net types `HTraceDAGNet` and `HTraceAsyncDAGNet`.

Differential Revision: D4366849

fbshipit-source-id: 56038d3d64a3fd5ab363feda86a19a6f2496971c
2016-12-27 11:44:31 -08:00
547d151728 fix for failing dir creation 2016-12-27 11:38:24 -08:00
de15c67844 download a newer cmake 2016-12-27 11:25:21 -08:00
8c6fe64e1d upgrade cmake version to use proper linking flag (full paths) 2016-12-27 11:11:13 -08:00
5dfc3f8681 Merge branch 'master' of https://github.com/bwasti/caffe2 2016-12-27 08:51:01 -08:00
c2c58480ab added google/benchmark and tidied up Cuda build 2016-12-27 08:49:41 -08:00
d267f1c320 remove apt-get of nvidia toolkit (which installs nvcc-7.5) 2016-12-26 12:45:05 -08:00
826abe8438 Merge pull request #5 from caffe2/master
merge caffe2:master into bwasti:master
2016-12-26 14:19:30 -05:00
118cc4174a Remove binaries build (it seems to be broken) 2016-12-24 07:20:45 -08:00
a4f3721e15 weightedsum on ps
Summary:
Rewrite D3993337 based on new stack.
Comparing to the old one, we need more readers to achieve the same speed. But so far the speed is the same and the new bottleneck is the write bandwidth of trainer. Model quality is the same as the base.

Reviewed By: azzolini

Differential Revision: D4310803

fbshipit-source-id: 6d04ae8040c1ee7caa9aea5287f054e73fbe325a
2016-12-22 19:14:38 -08:00
e14dc54a28 Merge pull request #4 from caffe2/cmake
Cmake
2016-12-22 17:35:56 -08:00
4ad367a6fa Merge branch 'master' into cmake 2016-12-22 17:35:29 -08:00
55ae35c0ba remove -lcnmem 2016-12-22 17:09:50 -08:00
a7f8fe0423 introduce request net into prediction schema
Summary: As title. We want to have request_only net which runs on user_only sparse features. Submitting to get early feedback.

Reviewed By: dzhulgakov

Differential Revision: D4282783

fbshipit-source-id: 71241bf5444550075884c788c2da4783659bc1e0
2016-12-22 15:59:27 -08:00
6bb9ec1d78 Merge branch 'master' of github.com:bwasti/caffe2 2016-12-22 15:09:52 -08:00
074924eb19 remove broken test 2016-12-22 15:09:33 -08:00
e51e651255 Remove redundant and failing test of FeedBlob asserts
Summary: Recently a PR landed that removed asserts of trying to feed float64 to FeedBlob for GPUs and changed to a warning. Thus the test testing assertions were given started to fail. Removing it.

Reviewed By: Yangqing

Differential Revision: D4363780

fbshipit-source-id: d9e222c309302243138d4ff3c223c711a4d2052d
2016-12-22 14:59:28 -08:00
e7690070ca removed encrypted binary 2016-12-22 14:41:45 -08:00
3eb08feff5 Support no_bias in naive group conv implementation
Summary:
I was testing perf difference between naive group conv and cudnn group conv. I am doing no_bias conv and added support for that in naive implementation
although its deprecated, i thought it would be nice to have working things in our code

Differential Revision: D4363168

fbshipit-source-id: 29719013d79b449fd359884709c7a1195be51ae3
2016-12-22 14:14:26 -08:00
3d69cf1fa7 added cudnn 2016-12-22 14:14:03 -08:00
ed2994a385 Add c++11 support to nvcc 2016-12-22 13:43:23 -08:00
fc74eae082 fix build for older versions of CUDA 2016-12-22 13:16:41 -08:00
d4a783405f Merge branch 'master' of github.com:caffe2/caffe2 into cmake 2016-12-22 13:15:05 -08:00
85d7688811 add display level to htrace_to_chrome.py
Summary: This diff adds an option to the htrace_to_chrome.py format conversion script so that users can decide to display less traces by hiding kernel/operator/worker spans. For example, passing the arguments `--display worker` will make the script process spans up to worker spans and not go further (deeper).

Differential Revision: D4360404

fbshipit-source-id: aa5af7e499b94aeb3de06823bdeeedfbc3b1c02b
2016-12-22 13:14:27 -08:00
db5cc8f278 revert exhaustive_search setting to False
Summary: As per discussion in D4355529

Reviewed By: prigoyal

Differential Revision: D4362162

fbshipit-source-id: 795fcf1507235a7dc3c7a10b0453037936d057aa
2016-12-22 12:44:42 -08:00
72e957e611 update third party files 2016-12-22 11:54:40 -08:00
d570d1f405 fix USE_*DB option issues 2016-12-22 11:52:04 -08:00
2da1b44b7f pass std=c++11 directly to nvcc 2016-12-22 11:32:23 -08:00
71db174410 downgraded cuda to 7.5 2016-12-22 10:39:47 -08:00
4fda7467fb removed docs from cmake branch 2016-12-22 10:33:09 -08:00
45807c89ac matrix install and export CXX in the script with COMPILER variable 2016-12-22 10:00:46 -08:00
a14d2b5817 turn off cuda propogate host flags 2016-12-21 18:20:03 -08:00
291c971e36 CMAKE VERBOSE on 2016-12-21 17:47:04 -08:00
e2181a32ca Normalize rank loss gradient to avoid convergence issues when the number of pairs is really large
Summary:
Essentially, when number of pairs is around 1000, then only positive samples in the list gets a massive boost from all the negative examples. This diff normalizes the gradient and the loss with the number of pairs.

This diff also adds protection against NaN and more logging to help debug.

Reviewed By: kdub0

Differential Revision: D4359782

fbshipit-source-id: 7240344ddb1f2f670d1eec1b03e7f6e413f3dfcc
2016-12-21 17:29:24 -08:00
505fdc99b7 fix gcc path search 2016-12-21 16:52:26 -08:00
613a8f1040 update gcc to 5.4 2016-12-21 16:20:48 -08:00
2c6a579859 Make all convolution operators allow optional bias term
Summary:
It used to be that only the cudnn engine supports it, and now it should be
fully supported by any conv engine.

To ignore bias, simply use a convolution op that has two inputs instead of
3. The gradient operator will automatically figure out that it does not
compute the bias gradient.

Reviewed By: prigoyal

Differential Revision: D4354183

fbshipit-source-id: cf71b6289a254d15a6a663a85df63fbbaec3702b
2016-12-21 15:14:24 -08:00
1c8185ce52 Merge branch 'osx-build' into cmake 2016-12-21 14:30:24 -08:00
d7836b2f5a Preserve metadata on schema.List.lengths
Summary:
Ievgen ran into this bug with his dper work - we didn't preserve metadata on lengths field.

Also, we didn't take keep_blobs into account for List's main field. Now fixed.

Also, reformat the file to be nice.

Differential Revision: D4357859

fbshipit-source-id: 1c26c533a10d38afab13b46ccbcb541f5fa9074a
2016-12-21 14:29:48 -08:00
4c22d3769b maybe fix cmake file 2016-12-21 14:23:15 -08:00
728c694f1d network install rather than local 2016-12-21 13:55:16 -08:00
bb3cec8046 fix broken link for ubuntu download 2016-12-21 13:42:42 -08:00
fed2cdf8cd change nvcc version 2016-12-21 13:27:29 -08:00
47bd606f63 Better visualization for gpu training plan
Summary:
The current gpu training plan has many sub-steps with same name (eg., "train/epoch"). This messes up the plan visualization. This diff fixes this.

before: https://our.intern.facebook.com/intern/graphviz?paste=56899036
after: https://our.intern.facebook.com/intern/graphviz?paste=56899704

Reviewed By: xianjiec

Differential Revision: D4343739

fbshipit-source-id: 8dbc01b4f3221999c78cb80a22ec8c11abf81172
2016-12-21 09:29:43 -08:00
5209a28c95 cuddn_exhaustive_search default True
Summary: As discussed, this improves performance a lot and is not a memory hog anymore. Anyway anyone can also turn it off.

Differential Revision: D4338798

fbshipit-source-id: bf0fdb594427ebe90e1e94b2effdc63196096b3f
2016-12-21 09:29:43 -08:00
82f1a8e12d fix code doc for data_workers
Summary: Fix bug in doc as reported by rpenggithub

Reviewed By: rpenggithub

Differential Revision: D4356796

fbshipit-source-id: a35e54247d84ba29ef1b8e8cac0de8a3d30b489e
2016-12-21 09:29:43 -08:00
fdb2a5b77a separate num_task and num_label. unify label schema. remove is_mtml
Summary: att. part of the effort to unify loader configueration.

Differential Revision: D4342147

fbshipit-source-id: bb021112f61d4838b0ccc7a5a8bcaf272cb35cd8
2016-12-21 09:29:43 -08:00
c2d28fb874 RNNs API simplification
Summary:
This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler.

Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp.

Future work:

1. Inferring step net output and internal blobs (scratches) sizes and type
2. Avoid accessing blobs by names in c++ part
3. Remove requirement for inputs / output 1:1 correspondence in the step net
4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created).

Differential Revision: D4268503

fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49
2016-12-21 09:29:43 -08:00
a2b31cf9e2 Install fixes
Fix paths, __init__.py initialization
Other assorted fixes
2016-12-21 09:14:04 -05:00
2d8d4ac423 Merge pull request #67 from caffe2/fbsync
Fbsync merge
2016-12-20 20:59:14 -05:00
7bf5c48e7e Updated eigen submodule 2016-12-20 15:54:17 -08:00
fe0d59d424 added -y flag to force addition of repository (timeouts on automated build system) 2016-12-20 15:48:57 -08:00
35fd17cae2 moved gcc installation into the script rather than addon 2016-12-20 15:33:34 -08:00
2b0054a642 fixed gcc update bug in travis.yml 2016-12-20 15:07:54 -08:00
baa058778c added newer version of G++ for potential fix of nvcc compilation 2016-12-20 15:01:34 -08:00
705d934481 added pip numpy install 2016-12-20 14:26:34 -08:00
6ea442629b added C as a supported language to the cmake file 2016-12-20 14:00:43 -08:00
6abf5c99dc Implement group convolution in the cudnn interface.
Summary:
This is an ongoing work - currently the forward pass is implemented, but backward
is yet to be done. We might want a CPU counterpart as well.

I will wait for D4341288 to land and then make bias optional.

Reviewed By: prigoyal

Differential Revision: D4342210

fbshipit-source-id: 51bb0e98d917970bdc040d076b535beb8e994d9a
2016-12-20 13:44:44 -08:00
a3e6f4cb7a add HTraceAsyncDAGNet
Summary:
This diff adds HTraceAsyncDAGNet, which is basically the async_dag version of HTraceDAGNet. Similar to HTraceDAGNet, we can use HTraceAsyncDAGNet by setting the net type to `htrace_async_dag`.

For now, we only track iteration spans and do not go deeper (operators, gpu kernels, etc.) because due to the implementation of AsyncDAGNet, applying HTrace is much more intrusive compared to HTraceDAGNet. Creating spans for operators for HTraceAsyncDAGNet is a future task.

This diff also adds a minor change in the TARGETS file so that `htrace_dag`, `htrace_async_dag`, and `prof_dag` are all accessible via one rule.

Differential Revision: D4351587

fbshipit-source-id: 1a4075a9a5efdfafb828a81b663cc731858f7307
2016-12-20 13:44:44 -08:00
93dd09dfd8 apt-get latest cmake 2016-12-20 13:37:20 -08:00
1d1528bc96 updated ubuntu to the xenial 2016-12-20 11:55:46 -08:00
a03692069e Adjust numerical precision of comparison to make test pass
Summary: see title

Differential Revision: D4351545

fbshipit-source-id: 1cca4552ea8f1051796a85724ba0c136ea38b5ec
2016-12-20 11:30:01 -08:00
0943c16324 added pthread download 2016-12-20 11:11:55 -08:00
6dccc8e4ab removed swap files 2016-12-20 10:59:02 -08:00
c77486da42 Merge pull request #2 from caffe2/documentation
added initial documentation template
2016-12-20 10:52:55 -08:00
64de84e069 updated ubuntu version 2016-12-20 10:47:40 -08:00
632a9fd23a removed old '.yaml' file 2016-12-20 10:39:20 -08:00
99fa9e7ae2 fixed yml extension to be recognized by travis 2016-12-20 10:38:43 -08:00
50e2a1515e Merge branch 'master' of github.com:bwasti/caffe2 2016-12-20 10:35:17 -08:00
0a85a977c6 caffe2/caffe2/utils/mkl/mkl_memory.h: avoid shadowing warnings
Summary:
Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local
(and/or the stricter -Wshadow-local) options.  Note that these
are both less onerous than -Wshadow.
I plan to enable one of them for all of fbcode, soon.

Rename inner "convert" to "convert2".

Reviewed By: Yangqing

Differential Revision: D4347297

fbshipit-source-id: 7494aedbaeeb2e5356db0612f5f32077f7ffd30b
2016-12-20 03:59:28 -08:00
e3d38fa933 Add rank loss options to mlp
Summary: This diff adds an option to use rank loss instead of cross entropy loss during training. This assumes that the data is loaded in batches which corresponds to sessions, which is something that was implemented for RNN training

Differential Revision: D4261923

fbshipit-source-id: e92a60cc9f53acc1585ac35d1fdb430c2ebbfa33
2016-12-19 20:59:30 -08:00
d4bbcab558 Setup MPI before test start
Summary:
With __name__ == "__main__" defined, MPI4Py was no longer being setup as intended, leading to test failures on syntax errors (_has_mpi, COMM, RANK and SIZE were no longer defined in a global scope. This is fixed via explicit use of global variables and factoring out the MPI setup into a new method.
Closes https://github.com/caffe2/caffe2/pull/59

Reviewed By: Yangqing

Differential Revision: D4348956

Pulled By: bwasti

fbshipit-source-id: ee741a0fff1df00eade1b6d5e1c281afcb38da6a
2016-12-19 15:59:32 -08:00
12c4090ea5 Skip sparse tests if operators not available
Summary:
Only tests for SparseFunHash for now
Closes https://github.com/caffe2/caffe2/pull/60

Reviewed By: Yangqing

Differential Revision: D4348961

Pulled By: bwasti

fbshipit-source-id: cd05d73ccc711b42a7d33e7a6b65a9d1a9bfa7e6
2016-12-19 15:59:32 -08:00
84e7eff458 Waive some hypothesis tests on GPU
Summary:
operators don't exist on GPU
Closes https://github.com/caffe2/caffe2/pull/63

Reviewed By: Yangqing

Differential Revision: D4348968

Pulled By: bwasti

fbshipit-source-id: 1fb8693842d6827ffcf96de2a9a8ba2f9dff0293
2016-12-19 15:59:32 -08:00
fd5da05a05 yes it is right.
Summary: TSIA

Reviewed By: bwasti

Differential Revision: D4348965

fbshipit-source-id: 60b95328086cf0bcf9690cef1e8cddfcbe997c72
2016-12-19 15:29:27 -08:00
05233cd5b8 Make bias optional in cuDNN conv op
Summary:
Yangqing This seems to work for me, not sure if it's implemented in the right way for you to accept :)

Allows user to specify "no_bias" as an option for convolution layers (only cuDNN at this point), so that the bias associated with that operator is not allocated or computed. This is useful in particular for conv + BatchNorm combinations (such as ResNets), as the bias term can be handled by both conv and Batch Norm, wasting memory and computation.
Closes https://github.com/caffe2/caffe2/pull/50

Reviewed By: Yangqing

Differential Revision: D4341288

Pulled By: bwasti

fbshipit-source-id: e6138d0024c83ed876dff2f83ffbebe7de502fd8
2016-12-19 14:59:49 -08:00
fe38a0c2b1 remove logging.basicConfig() from workplace
Summary: As part of PR from GitHub, "logging.basicConfig()" was added to workplace, causing havoc with existing logger configurations. It should not be here. Thanks rbgirshick for reporting.

Reviewed By: kdub0

Differential Revision: D4346077

fbshipit-source-id: 084ddcbfe6354bdaf5c97a42086c0bd36ec4629c
2016-12-19 11:59:26 -08:00
bd0b61fef1 Minor comment changes for caffe2.
Summary: Found some comment typos while working on T14849353.

Reviewed By: Yangqing

Differential Revision: D4334469

fbshipit-source-id: f880e2a3e9a4e1152b315c6d3c8b68ad298d6334
2016-12-19 11:29:37 -08:00
6c3cca9bc7 Build caffe2, NNPACK, FXdiv, pthreadpool for macOS
Summary: Builds caffe2 and dependencies for macOS. Not included in the MSQRD engine or elsewhere yet.

Reviewed By: Yangqing

Differential Revision: D4334013

fbshipit-source-id: 31cacf07e2b07f379e1894e51dde5103c56b8815
2016-12-19 11:29:37 -08:00
09187f4aea Moved core_overhead_benchmark to oss. Use google/benchmark
Summary: TSIA

Differential Revision: D4343478

fbshipit-source-id: 61ce1b8d72f689cd2ff46b73684ba298a05ed73a
2016-12-19 10:45:20 -08:00
d87edd39e7 math gemm interface fix
Summary:
I don't know why I did this embarrassing bug that changes the order of
ldb and beta in the gemm interface. This fixes that.

Differential Revision: D4014493

fbshipit-source-id: 1aec950b6e9d57e947654d4044e50930f2db1344
2016-12-19 10:45:20 -08:00
938b78b677 Merge pull request #61 from caffe2/fbsync
migration to master
2016-12-19 13:44:02 -05:00
d37fffd257 use in-place ReLu to safe a lot of memory
Summary: Reading Torch docs about Resnets, and soumith's comment,  they mention significant memory-saving with in-place ReLu. prigoyal already had this in her code, but I did not. This saves memory a lot: 9851 MiB -> 7497 MiB.

Reviewed By: prigoyal

Differential Revision: D4346100

fbshipit-source-id: e9c5d5e93787f47487fade668b65b9619bfc9741
2016-12-19 09:29:26 -08:00
70dcba376c using BlobReference for Sum gradients.
Summary:
We create a Sum operator to sum up the gradients. Currently we use strings for its input/output blobs.
So the code will fail if AddAllGradients() runs within a NameScope.
To avoid this, just BlobReference instead of string for blobs.

Reviewed By: xianjiec

Differential Revision: D4343701

fbshipit-source-id: 2d008916e192d75c6e20f97921331ac4c7b73363
2016-12-18 09:29:22 -08:00
17a5a6ae32 fbcode: remove unused includes from .cpp files with no #if and #define
Summary:
This is a first diff to remove the "easiest" unused includes in fbcode.

* For safety, we only touch .cpp files without #if and #define,
* We do not try to remove redundant systems headers (aka. "packing").

The diff was generated as follows:
```
foundation/scripts/ls-cpp-dirs | grep -v '^\(\.\.\|external/\|.*/external\)' | xargs ffmr -o /tmp/ffmr-diff-1 codegraph/scripts/ffmr/analyze_includes_no_headers_no_packing_skipping_ifdefs.sh

cat /tmp/ffmr-diff-1/*.diff | patch -p2
hg commit -m something
arc diff --prepare --nolint --nounit --less-context --excuse refactoring
```

Note: `grep -v` is just an optimization. The actual configuration is in these two files:
diffusion/FBS/browse/master/fbcode/codegraph/analysis/config.py
diffusion/FBS/browse/master/fbcode/codegraph/scripts/ffmr/analyze_includes_no_headers_no_packing_skipping_ifdefs.sh

See the task for more context, and the recent "safety" improvements on the tool.

depends on D4317825 for very few cases where `nolint` had to be manually added.

Reviewed By: igorsugak

Differential Revision: D4312617

fbshipit-source-id: ecc1f0addfd0651fa4770fcc43cd1314661a311a
2016-12-17 18:29:27 -08:00
9e498c7bba caffe2: removing message logging in conv_transpose_op
Summary: Avoid printing message repeatedly each time the conv_transpose_op (with cudnn) is called

Reviewed By: Yangqing

Differential Revision: D4337242

fbshipit-source-id: 27b048bad8c54604d91174acd4928a1496f2f5c7
2016-12-16 17:44:25 -08:00
78edb8295e No exception for float64 in FeedBlob. Warning instead.
Summary:
The exception in FeedBlob causes many tests to fail.
Instead of exception, we log a warning message and move on.
Feeding a float64 blob should not cause any issue.
Closes https://github.com/caffe2/caffe2/pull/57

Reviewed By: bwasti

Differential Revision: D4343135

Pulled By: Yangqing

fbshipit-source-id: cd1144b94c9883fcbd8bdcd78f9f93a67debc0a6
2016-12-16 17:29:29 -08:00
ea2deae9e3 remove unnecessary code since our compiler is fairly modern
Summary: TSIA

Reviewed By: bwasti

Differential Revision: D4343001

fbshipit-source-id: ff7496f720602e433170ab7ac52be4c18e916e43
2016-12-16 17:29:29 -08:00
99e97a4b7a Correction to paths to find cuDNN 2016-12-16 16:03:23 -05:00
a8ae63c3e0 HuffmanTreeHierarchy operator
Summary:
An operator that reads labels compute their counts and generates huffman tree
hierarchy. It generates all paths from root node to leafs labels as serialized
HierarchyProto to be used as an input to HSoftmax operator.

The tree is constructed in a bottom up greedy way keeping indices to parent
nodes to in order to generate the code and the path from root to leave in
a bottom up traversal.

Note:
HSoftmax handels computing a generic hierarchy which means for the binary case
we can save one matrix x vector operation per node by representing every node as
logsitc function and also reduce the paths proto size by producing only
one integer list to represent the path / indices and bytes list for the code
per label.

Differential Revision: D4303294

fbshipit-source-id: c7f0d3c204536234c26bb2a4228cb3a1892db395
2016-12-16 10:59:48 -08:00
29f903aaf2 Make computed params broadcast optional
Summary: this was introduced due to rm and riv params in SpatialBN layer and the likes. We should be saving these params as well but it is not required to broadcast these params to all gpus after every epoch.

Differential Revision: D4338749

fbshipit-source-id: d3bbc92cf0cd7d220a51d76aea8bffcfd6e520b7
2016-12-16 07:59:25 -08:00
dac78727fb Add missing file 2016-12-16 08:00:47 -05:00
e8dc09064e exhaustive_search=True
Summary: For some reason I had been disabling the exhaustive search heuristic for cudnn for xray/resnet trainers. On BigBasin, this gives 10% perf boost. On BigSur maybe 5%.

Reviewed By: prigoyal

Differential Revision: D4338654

fbshipit-source-id: 3974dd612f5d4f4dc8b2febccb59664d3f276c3e
2016-12-15 22:59:27 -08:00
fc27f83282 restore control_input
Summary: I accidentally landed in D4327024 the control_input disable for NCCL. This empirically increases likelihood of deadlocks, although gives a nice perf boost. But better to disable before NVIDIA fixes their stuff.

Reviewed By: Yangqing

Differential Revision: D4338537

fbshipit-source-id: d43efb45965a88bcfe38e5f1dc16c04463e2e038
2016-12-15 21:29:29 -08:00
35fa9e9c5f a couple small reliability improvements
Summary:
A couple of more misc changes:
- allow starting the coordinator multiple times -- this makes data parallel programming easier
- make the fetcher id a global sequence, before each gpu had same ids for workers
- my flow jobs got stuck when joining the fetcher threads. I think there is actually a memory fencing problem with the is_active boolean. But I am too tired to add proper condition variables there. Instead just add timeout to join(). It is needed anyway since some i/o thread could get blocked.

Differential Revision: D4333381

fbshipit-source-id: 88226c8a9c9a5e05d771360a502a2ba21a6b9d76
2016-12-15 21:29:29 -08:00
c016e64914 Fix test cases: tensor of size 0 not supported by GPU ops yet.
Summary: TSIA

Reviewed By: bwasti

Differential Revision: D4334592

fbshipit-source-id: f101887ede2691aef8ca317e5286347c52779774
2016-12-15 19:59:24 -08:00
42bbdda8c4 MKLDevice and MKLOperator
Summary:
This adds Caffe2 support for MKL operators directly with MKLMemory. Included a
Relu layer that shows how to use it.

Reviewed By: salexspb

Differential Revision: D4322144

fbshipit-source-id: 8b3392c4fd024ab1a7ba7135c349ebd3e1976799
2016-12-15 19:59:24 -08:00
dbe7aeb883 move HTraceDAGNet and ProfDAGNet to contrib
Summary: This diff moves all tracing code under fb/htrace and fb/prof to contrib/prof.

Differential Revision: D4333032

fbshipit-source-id: 1d1ae14c3d376a89f9199561cada53b2ca62e81a
2016-12-15 14:59:56 -08:00
2bf18f2b1d add inception and dummy input
Summary:
As requested by Yangqing, added Inception model (copied from convnet_benchmarks) and a dummy data feed option to the xray trainer, that we use for scalability benchmarking.

+ a couple of minichanges to the data input framework

Reviewed By: Yangqing

Differential Revision: D4327024

fbshipit-source-id: 86911468456fc13a32d5f437a43347380ec66a68
2016-12-15 13:40:22 -08:00
30f323298f exclude every branch but master for build testing 2016-12-15 13:32:33 -08:00
dd74c5d3b8 Implement rank loss method using logit function and pairwise comparisons
Summary:
This is just a stub for now. I need to add a report metric as well before I can produce a complete flow.

Possible extensions:
Implement list-wise loss, allow for more than one session in a batch and create a framework for arbitrary loss functions to be applied

The data loader will be the same as for RNN

Reviewed By: xianjiec

Differential Revision: D4245176

fbshipit-source-id: 546683b6551654a37c410dc1606e556a7bf83a2a
2016-12-15 12:01:31 -08:00
e80423f341 bug fix to distringuish train/test data
Summary:
We often use same net for training and testing, but we must distinguish their data. My yestterday's diff forgot to include that distinction (it was in the xray sampler before), and this diff adds it. Basically one provides a name for the input source for data_workers, and all the queues and scratch spaces are suffixed with that to separate them.

Also specify the caffe2 queue's size to 4, which is empirically found to be sufficient. It was errorneously defined to be function of batch size, which does not make sense as each *element* in the queue is a batch, and led to out of memory issues on xray trainer.

Differential Revision: D4329449

fbshipit-source-id: c994da1c8b0935b8eda2402c118d49b76caa7da8
2016-12-15 12:01:31 -08:00
cb918ac727 Implementation of ResNets on imagenet dataset
Summary:
adding imagenet dataset as well
data augmentation and model has been added, just need to add db read

Differential Revision: D4289150

fbshipit-source-id: b531d3f09e3d0efac5cda5bb75d8146e1bb693e4
2016-12-15 12:01:31 -08:00
585b8f7c9d Templatize store handlers
Summary: Needed to create these ops in CUDA context.

Differential Revision: D4321727

fbshipit-source-id: 518fe5f994d33ea2dcf9b2cc955848b8bb7b06cd
2016-12-15 12:01:31 -08:00
dc16bcfa27 Remove float64 test
Summary:
float64 test breaks things on the cuda side. I am deleting it for now and if
we add it back, let's make sure we run the test on a GPU machine first :)

Reviewed By: azzolini

Differential Revision: D4324427

fbshipit-source-id: 0246fe9dd28a286422ca94c90f5b0fc33a162e74
2016-12-15 12:01:30 -08:00
0b52b3c79d Generalize threaded data input via queues + Everstore input
Summary:
Xray sampler (originally by ajtulloch) and prigoyal's resnet trainer use variants of the threaded data input where worker threads put stuff into a python queue that is drained by an enqueuer thread that dumps those batches to a Caffe2 queue, that is then drained by the net's DequeueBlobs operator.

There is a lot of boilerplate, which is also quite complicated.

This diff is an attempt to generalize that general stuff under a new module "data_workers" (name could be improved). Basically you pass it a function that is able to return chunks of data (usually data + labels).

I also created a module 'everstore_data_input' which generalizes everstore-origin data input with preprocessing function (image augmentation , for example). See how I refactored sampler.py for the usage.

Next we could create fetcher function for Laser data.

Differential Revision: D4297667

fbshipit-source-id: 8d8a863b177784ae13940730a27dc76cd1dd3dac
2016-12-15 12:01:30 -08:00
4858a6bc6f snapshot -> checkpoint
Summary:
This renames the "Snapshot" op name to "Checkpoint" as we discussed earlier.

The early Snapshot name is still available, but we should move to the new name and
eventually deprecate the old name.

The Python SnapshotManager should be also changed, cc azzolini

Reviewed By: dzhulgakov

Differential Revision: D4272021

fbshipit-source-id: 4b8e029354416530dfbf0d538bfc91a0f61e0296
2016-12-15 12:01:30 -08:00
ba58b80b16 Rename OperatorBase::OutputAt to OutputBlob and make the interface consistent with the rest
Summary:
TSIA

We also return reference for Input and pointer for Output just to be consistent
with the rest of the framework.

Reviewed By: bwasti

Differential Revision: D4318148

fbshipit-source-id: 857fd72bf929dac04a890f8f787a6fad84bd4287
2016-12-15 12:01:30 -08:00
d38499f727 Optimize BlobIsDefined() + benchmark --> net construction 95 secs to 8.2 secs!
Summary:
I have noticed that constructing the Xray model takes quite a while. To measure this, I wrote a benchmark script that creates a resnet-50 model on 8 gpus. This takes about 95 secs -- which is kind of annoying when you want to quickly debug stuff.

Profiling (using Python's cProfile), I was able to see that the most of the time is used in net.BlobIsDefined(), which does a linear search over external inputs and operator outputs. Thus it gets slower and slower with large nets.  This can be fully optimized by keeping a separate lookup table of operator inputs and outputs (and external inputs and outputs). It is a bit annoying to keep this separate data structure, but I setup the unit tests to ensure things are doing correctly over Clones.

After the optimization, the net construction drops from 95 secs to 8.2 secs!

Reviewed By: azzolini

Differential Revision: D4288307

fbshipit-source-id: 0bb82c8bde9d86a2702b298f4aa706cba509346e
2016-12-15 12:01:30 -08:00
11a6f48fe7 Fix a few docstrings in operator.h that is not correct.
Summary: TSIA

Reviewed By: bwasti

Differential Revision: D4318071

fbshipit-source-id: f82c21dd44285818f61fe23096e7a93652c705c8
2016-12-15 12:01:30 -08:00
1a00ffea2a Implement fix recommended by @slayton58
Summary: This addresses integer division errors.

Reviewed By: bwasti

Differential Revision: D4315555

fbshipit-source-id: 13ef9496409b3452bc5fb66ce787b11af1382132
2016-12-15 12:01:30 -08:00
4cd263db74 Last N window collector
Summary: Allows to collect samples over multiple batches. The method uses a circular array and so there is no guarantee about the order of the samples. The goal is to get a view of the data accross multiple batches

Reviewed By: salexspb

Differential Revision: D4216181

fbshipit-source-id: bb9e1fa84ac7e04006dcddb53c9347a42ec83dc8
2016-12-15 12:01:30 -08:00
0b21581784 update torch to fecf29bb6ad7b8117eff9712d833972205de1201 cutorch to 64f974178c03c93666cfe3796b7e2d7b549476a2 nn to e8ec31cd0a531b7f7a3247dd7e777958a643d931 and cunn to 64224a65eff88d1bfe5bc47d26a901ed8c0b4705
Summary: updated stuff from upstream

Reviewed By: colesbury

Differential Revision: D4086676

fbshipit-source-id: a246fe0fe3a89699e88139e86850889193b3f360
2016-12-15 12:01:29 -08:00
6191de7ac9 gradients for CopyGPUToCPU and CopyCPUToGPU + unit test + schema
Summary: Added gradients for the Copy operators. They are simply the reverse operation. Also added a unit test to test things actually work and added the operator schema and registration to model_helper's known operators.

Differential Revision: D4306516

fbshipit-source-id: dd0633fa7f2ed01991990e56e63669794df037d9
2016-12-15 12:01:29 -08:00
390867d2d0 Fix RecurrentNetworkGradient with batch size > 1
Summary:
Fix RecurrentNetworkGradient with batch size > 1.
The main issue was that we always set the Gradient output to 1, 1, recurrent_size which mismatch the input (1, batch_size, recurrent_size).
Further gradient ops do Squeeze and split assuming that output gradient blob is the same size as the input so they fail.

The fix is simply Resizing the output as the input (1, batch_size, recurrent_size), I had to move the resize to the RunOnDevice since batch_size is computed from Input(0) which is not available till the we actually run the op.

Differential Revision: D4301487

fbshipit-source-id: e5c7426d6e770d985ce72a3737381a2b4af333ba
2016-12-15 12:01:29 -08:00
0bc104a3d0 fix unit test
Summary: ...

Differential Revision: D4298663

fbshipit-source-id: 7831830a5201eb6603d846460c22b2f906e53858
2016-12-15 12:01:29 -08:00
1632f053e5 implement user-only metadata for input_record
Summary:
We want to implement request only net and to do this we decided to split the work into two parts. The first part will propagate required metadata and the second part will cut the nets properly.
This diff is to propagate request_only metadata across the layers.

A few notes about implementation:
  - Each layer contains a field request_only which can be set based on the input_record. If all the scalars from the input_record are marked request_only we mark a layer as request_only;
  - Sparse-To-Dense layer sets request_only metadata;
  - SigridTransformation and SparseLookup layers propagate request_only status;
  - As for now we join request_only and other sparse features together in input_record, but ideally we may want to separate this, because request_only should be served separately;

Reviewed By: xianjiec

Differential Revision: D4259505

fbshipit-source-id: db8a30ef92cba84f1a843981b9dde3a8b9633608
2016-12-15 12:01:29 -08:00
2c3eb3e592 fix sequence_ops doc (pad_width -> padding_width)
Summary: The doc for sequence ops says "pad_width" instead of "padding_width". This diff fixes it.

Differential Revision: D4277186

fbshipit-source-id: 63af6cce2fe0af0d395f78c6a6a1f41518039cf8
2016-12-15 12:01:29 -08:00
68cfc52452 MomemtumSGDUpdate -- version of MomentumSGD with update.
Summary:
It gives a significant perf boost to do the parameter update inside MomentumSGD, instead of with a separate WeightedSum op.
To ensure backwards compatibility, I made it a separate op.

Also added an unit test.

Reviewed By: prigoyal

Differential Revision: D4262446

fbshipit-source-id: 38e7ee6d7677b398658ac7fe9b7a59b569e033f4
2016-12-15 12:01:29 -08:00
3c47d41f86 add unit test for row mul
Summary: so that we are more confident.

Differential Revision: D4290132

fbshipit-source-id: 44e4687d977ab90cc022a14131bbf701bdf131d4
2016-12-15 12:01:29 -08:00
68fbc42830 fix empty tensor handling in some operations
Summary: some operations don't handle the case where the output tensor is empty, and cause segfaults or unexpected behavior (uninitialized output tensor). This diff ensures that BatchMatMul, filler operations, PackSegments/UnpackSegments and ReadNextBatch don't fail and properly initialize their output with the correct type. Those seem like fairly straightforward changes, let me know if you'd rather break it up into separate diffs.

Reviewed By: Yangqing

Differential Revision: D4277149

fbshipit-source-id: c5a30b67bb3b451b117d6aa83827d40b71240c2b
2016-12-15 12:01:29 -08:00
2847c8f624 input_as_shape option for Filler ops
Summary: I couldn't find a way to fill a tensor with a shape provided at runtime, so I added an input_as_shape option to the filler ops. When input_as_shape is true, the input can be used to directly provide the shape of the output (this is different from the default behavior, where the output is reshaped like the input). For example if the input contains [2, 3], the output will have shape [2, 3]. Let me know if you see a simpler way :)

Reviewed By: Yangqing

Differential Revision: D4276872

fbshipit-source-id: 095e995d8bf302152765bd51c405185ef9952212
2016-12-15 12:01:29 -08:00
cd780eb9ec avoid Exp overhead when handling underflow with MKL
Summary:
I've been noticing when running caffe2 experiments that calling Exp with many values close to 0 causes MKL's underflow error handler to be called repeatedly, causing significant overhead while the result is correct (e.g. exp(x) = 0). I suggest setting the error mode to VML_ERRMODE_IGNORE for those functions, unless there are good reasons not to.

with the current function (see mkl_vml_kernel_sError and vsexp_cout_rare):
{F65147147}

with VML_ERRMODE_IGNORE:
{F65147148}

Let me know if you see a better workaround

Reviewed By: Yangqing

Differential Revision: D4277240

fbshipit-source-id: d44168da32caee4a3f88227ffb70cdc3d5314722
2016-12-15 12:01:28 -08:00
eddf23ca0f Handle parameters that are computed but not optimized
Summary:
prigoyal sharply noticed a bug in the Resnet models: we have not been checkpointing, nor synchronizing between gpus, the moving average and variance computed by the SpatialBN ops.  Particularly the first problen is serious, since models starting from checkpoint would have started from a null-state for SpatialBN. Not synchronizing with the data parallel model is less tragic since each GPU should see very similar data.

Thus I propose keeping track of "computed params", i.e params that are computed from data but not optimized. I don't know if there are other examples, but SpatialBN's moving avg and var definitely are one.

- I modified the checkpointign for xray model to store those blobs + also ensure the synchronization of those blobs
- I modified data parallel model to broadcast those params from gpu0. I first tried averaging, but hit some NCCL deadlocks ... :(

Differential Revision: D4281265

fbshipit-source-id: 933311afeec4b7e9344a13cf2d38aa939c50ac31
2016-12-15 12:01:28 -08:00
8dbe435235 Ensure input type consistency in Concat operation
Summary: with the current code, Concat accepts inputs of different types and concatenates them as raw data. This causes bugs that can be hard to find: for example, when concatenating a tensor of int with a tensor of long, the long integer get split in two, and the output tensor contains garbage. This adds the necessary checks to make sure the input types are all the same.

Reviewed By: Yangqing

Differential Revision: D4277109

fbshipit-source-id: c1568f74bb66f0d9146a54441c0ee664d5516b77
2016-12-15 12:01:28 -08:00
206029bc5a fix caffe2 tensor index overflow in Extend/Reserve/Shrink
Summary: I ran into a bug when working with very big tensors in caffe2 (> 2GB). When extending beyong a certain size, the size computation was using int32 instead of int64 and would overflow. This fixes the issue.

Differential Revision: D4276487

fbshipit-source-id: 1704a69c4363c7a5b2f7db748d7d570a9593f2b1
2016-12-15 12:01:28 -08:00
c70e8115a1 dper_example use RowMul for speed
Summary:
Faster ~65k vs 25k:

After: 11444089
Before: 11259149

Differential Revision: D4275671

fbshipit-source-id: 57de414676799980632c1d29142ee698965b1b68
2016-12-15 12:01:28 -08:00
48bd64b41b RowMul
Summary: Position weighted embedding is a bit slow due to the hacky implementation of Mul with broadcast. This diff speeds up the Mul with RowMul.

Reviewed By: xianjiec

Differential Revision: D4271193

fbshipit-source-id: e5c35e18920aeef3de3a7304a8f5727d0c980613
2016-12-15 12:01:28 -08:00
0154db83c0 Merge pull request #54 from slayton58/cmake
Initial CMake building with deps
2016-12-15 10:46:19 -08:00
033acae6b4 Update README.md 2016-12-15 09:56:55 -08:00
64448905ba added nvidia toolkit to installs 2016-12-15 09:55:45 -08:00
b348e9677c Removed deprecated installation instructions 2016-12-14 17:02:19 -08:00
d0219bf7bb Merge pull request #1 from bwasti/travis
updated Master w/ travis
2016-12-14 16:48:48 -08:00
fd04a3468c Added .travis.yaml 2016-12-14 16:04:58 -08:00
03c9d54fd0 Support openCV 2 2016-12-14 14:59:59 -05:00
a46f0fb3cb Merge branch 'cmake' of https://github.com/slayton58/caffe2 into cmake 2016-12-14 11:00:17 -05:00
788f715a6e third_party protobuf support
Fix python lib missed proto dep
2016-12-14 10:54:15 -05:00
df12f431e0 Removing extraneous cmake files
Leftover from Caffe cmake build system
2016-12-13 09:29:01 -05:00
d7eeebc269 Refactored CUDA detection a bit
Refactoring, minor fixes
2016-12-13 09:29:01 -05:00
d74bd7ee55 Add CUDA NVRTC cases 2016-12-13 09:29:01 -05:00
fbbb87cd46 Enhancements
Add BLAS chooser
Move cuDNN detection from Cuda -> FindCuDNN
Refactor main C2 libs, should enable no-GPU build (untested)
2016-12-13 09:29:01 -05:00
5e699ce6c2 CUDA fixes
Fix NCCL build
move CUDA dep into Dependencies file
2016-12-13 09:29:01 -05:00
b9599c7464 Compiling entire project
Can run CIFAR10 Python example!
2016-12-13 09:29:01 -05:00
e9f1222408 Compiling most of the project
Now compiles all CPU + GPU code, tests + binaries with deps
2016-12-13 09:29:01 -05:00
c05ff206b6 Build binaries 2016-12-13 09:29:01 -05:00
2610d62813 Build Python libs 2016-12-13 09:29:01 -05:00
52f09fe2c9 Initial building with deps 2016-12-13 09:29:01 -05:00
e9de70f296 Added basic build system 2016-12-13 09:29:01 -05:00
122e115937 Removing extraneous cmake files
Leftover from Caffe cmake build system
2016-12-12 12:50:08 -05:00
681267b66a Refactored CUDA detection a bit
Refactoring, minor fixes
2016-12-12 12:29:00 -05:00
9f35f47411 Add CUDA NVRTC cases 2016-12-09 11:01:27 -05:00
09de969e9f Enhancements
Add BLAS chooser
Move cuDNN detection from Cuda -> FindCuDNN
Refactor main C2 libs, should enable no-GPU build (untested)
2016-12-09 10:29:06 -05:00
cdb2fb6737 CUDA fixes
Fix NCCL build
move CUDA dep into Dependencies file
2016-12-09 09:02:26 -05:00
66a71c0232 added initial documentation template 2016-12-08 17:01:16 -08:00
f79bffc78d Compiling entire project
Can run CIFAR10 Python example!
2016-12-08 13:23:04 -05:00
2a974f5ca2 Fix 1.3.2 compilation 2016-12-08 09:11:43 -08:00
4255ee9944 Compiling most of the project
Now compiles all CPU + GPU code, tests + binaries with deps
2016-12-08 08:40:29 -05:00
497659ce0d Build binaries 2016-12-07 10:54:06 -05:00
f3c20620ed Build Python libs 2016-12-06 13:06:16 -05:00
3d719f4bff Initial building with deps 2016-12-06 11:39:15 -05:00
648e9fbb58 Adding missing file 2016-12-05 18:06:24 -08:00
dea27ca4ca use TIndex for set in math.h
Summary: as desc

Differential Revision: D4271900

fbshipit-source-id: 92f7cbbe33e0ce4fcc21a8af9ded4f436afb43e2
2016-12-05 11:53:27 -08:00
5f7d1f02f2 Use native reader for evaluation
Summary:
Since hashing is different.

This should be ready to commit now. Running ads nn canaries.

Differential Revision: D4264009

fbshipit-source-id: 3aa16b0c47c61f9a442b0375524c5f1580af5892
2016-12-05 11:53:27 -08:00
1aba4280d8 Make xray net_type configurable
Summary: Make xray net_type configub a command line argument

Differential Revision: D4262076

fbshipit-source-id: e2ecb9cd5bee5d6aaebe0ea8d2d4d9b378058cba
2016-12-05 11:53:27 -08:00
6c13dc3dd0 Fix CreateCommonWorld schema
Summary: TSIA

Reviewed By: dzhulgakov

Differential Revision: D4264328

fbshipit-source-id: 59eaf791a05b0202000f3b7266aba63e146229d4
2016-12-05 11:53:27 -08:00
ab3fea540d Add serialization interface for MKLMemory
Summary: This allows us to serialize things between MKLMemory and a TensorProto.

Reviewed By: dzhulgakov

Differential Revision: D4218044

fbshipit-source-id: 934181493b482cb259c17ff4b17008eac52fd885
2016-12-05 11:53:27 -08:00
e65eeff665 LMDB example
Summary:
This examples writes a LMDB database of image data and labels (random). Then it reads them using Caffe2's TensorProtosDBINput and validates the checksums match. This example shows how to coerce image data into TensorProtos and be happy.

Before there was no clear example how to create databases for Caffe2.

Differential Revision: D4263614

fbshipit-source-id: 21e08066899095b4efcc2d23dbc3ede81e75914a
2016-12-05 11:53:26 -08:00
96a5e88d63 Fix consequtive checkpoint syncs
Summary: Switching to Pieter-MPI changed the way we setup network between operators. For syncronizing parameters after a checkpoint load, we run a checkpoint_net that contaiend operators for creating the common world and broadcast operators. Unfortunately this fails when the checkpoint sync is done a second time, because we would have created a duplicate common world. Solution is to separate common world op and broadcast op to init net and the actual broadcasting net, and we run the init net only once. This problem did not arise in the Flow version since I did only one checkpoint loading per operator (process).

Differential Revision: D4251754

fbshipit-source-id: ba030579e651e529e29bbf2d27920075078d8ff9
2016-12-05 11:53:26 -08:00
3125e6a821 Hacky fix for cloned model rewriting
Summary:
Disclaimer: this is really hacky

Continues a fix from D4218902. The root problem is that DPER builds net incrementally and input_record doesn't support it properly. For not I just manipulate the input record directly. Alisson wants to fix it properly later by allowing set_input_record to accept a superset of current record.

But it should unblock our experimentation.

I'm curious how it's going to look in dper_example world.

Reviewed By: azzolini

Differential Revision: D4255285

fbshipit-source-id: ff65b6f943d705a9b3399035597e2e8ded2e1ff3
2016-12-05 11:53:26 -08:00
ea9a0f24bf automatic aggregation of sparse gradients
Summary:
This adds support for automatic aggregation of sparse gradients. We simply concatenate indices and values (no attempt to deduplicate, since this is already done before feeding into the optimizer). This should support various cases (indices and/or values can be generated by one or more gradient ops, or gradient outputs can be directly passed from inputs).

I tried to minimize the code footprint, but I introduced SparseGradGenMeta because GradGenMeta didn't lend itself very well to be used with sparse gradients.

Reviewed By: dzhulgakov

Differential Revision: D4219788

fbshipit-source-id: 1d074664cffd82a8764e4b1473ada6bc46e6c51a
2016-12-05 11:53:26 -08:00
2045a5de9f add position based weighting
Summary: adding more methods to the layer representation. The corresponding implementation in DPER is: https://fburl.com/563869364

Differential Revision: D4256583

fbshipit-source-id: 91326b7bb9e960a5bc70b5a13812fce90054eceb
2016-12-05 11:53:26 -08:00
3410939459 pass learning rate scaling factor to parameter update builder function
Summary:
When refactoring data parallel model, the division of LR by number of devices was dropped, and thus we ended up effectively multiplying gradients by the number of devices. Thus, we need to scale the LR by 1/numgpus.

Created a test to confirm that data_parallel_model produces exactly same results on different number of gpus, given the total batch size.

Reviewed By: prigoyal

Differential Revision: D4248907

fbshipit-source-id: af21ede113e6ac25f12c556de298cb18974548be
2016-12-05 11:53:26 -08:00
a3942b2d64 Add store ops and tests
Summary: Basic ops to set/get/check/wait against a StoreHandler.

Differential Revision: D4248059

fbshipit-source-id: cc53061fcc13823d4b9eed6b7c1c346b9e8ec991
2016-12-05 11:53:26 -08:00
f3403a1110 Add RedisStoreHandler
Summary:
Add store handler implementation backed by a Redis server.

This allows for easy rendezvous when participating machines have no
access to a shared filesystem.

Differential Revision: D4241715

fbshipit-source-id: 4ce881df3a96af24f7efbb02d1050b3b2b9bc3c0
2016-12-05 11:53:26 -08:00
119b687994 Allow PythonOp to access the workspace
Summary:
DPER has very strange python ops that play with Workspace - they are somewhat similar to LoadOp/SaveOp, so I guess the semantics is fine.

Thus it makes sense to allow python operators to receive workspace pointer similarly to regular Operators.

I didn't figure out a better way to implement optional argument than just checking the number of args function receives on python side.

Reviewed By: ajtulloch

Differential Revision: D4242943

fbshipit-source-id: d97d4227815b741c8f884cfe254b06d2b56b5a41
2016-12-05 11:53:26 -08:00
2390dfefdb Kill few more CHECKs.
Summary:
One more small batch of CHECKs that left in C2 codebase. Most of the left overs
should be in tests/GPU only code.

Reviewed By: Yangqing

Differential Revision: D4243782

fbshipit-source-id: a4a03c116ea8ba16facd2efc135746d5921f19d5
2016-12-05 11:53:25 -08:00
af2a3076a2 add header for AsyncDAGNet
Summary: This diff adds a header file for net_gpu.cc so that the AsyncDAGNet class can be used to create other derived classes.

Reviewed By: ajtulloch

Differential Revision: D4230046

fbshipit-source-id: 379c3ff7ebb7aeeb4294f39e6f5d1ecad48b92f0
2016-12-05 11:53:25 -08:00
8f398d795e Added basic build system 2016-12-04 16:42:00 -08:00
34d27771c6 1.3.2 release
Broadcast tuning
Better checking of inputs
Copy/reduce code simplification
2016-12-01 15:17:50 -08:00
1093821c33 Replace min BW by average BW in tests 2016-12-01 15:16:35 -08:00
107966b059 add error message for asan
Summary:
This makes sure that we have useful CUDA error message in asan mode. Also
made a fb specific task pass by explicitly marking it not asan-able.

Reviewed By: dzhulgakov

Differential Revision: D4243471

fbshipit-source-id: 2ce303b97b3b4728c05575a8e7e21eb5960ecbc7
2016-11-29 15:18:39 -08:00
da72658fa8 sparsehash-based implementation of UniqueOp
Summary:
Faster implementation of UniqueOp using google::dense_hash_map, as suggested by dzhulgakov. I haven't benchmarked it precisely but early measurements with my workflow show a significant speed bump (this operation went from using 20% of overall CPU time down to 7%).

I gated the implementation using the "engine" feature, to avoid adding sparsehash as a dependency to caffe2.

Reviewed By: dzhulgakov

Differential Revision: D4219768

fbshipit-source-id: 2f142981e772105b42fffa24afb199ef816f8e0c
2016-11-29 15:18:39 -08:00
f16c2fe3da Create a reserve operation for tensors to avoid reallocating memory on Extend() and Resize() operations
Summary: I want to collect tensors over multiple batches and so this operation could become helpful to allocate enough memory from the beginning

Reviewed By: dzhulgakov

Differential Revision: D4216198

fbshipit-source-id: e6b67cc7d80d71455487878da9b6b7a225035085
2016-11-29 15:18:39 -08:00
1aafeb3565 clean up memory of c2/sigrid predictor
Summary: trying to optimize c2 predictor memory usage. mainly to remove unsed dbreader and dper metadata.

Differential Revision: D4232595

fbshipit-source-id: dcd7aa7dd09587ec9811a9e5ec725e0c22757665
2016-11-29 15:18:39 -08:00
f41b2ca85c fix sliceop for empty batch
Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch.

Reviewed By: dzhulgakov

Differential Revision: D4235498

fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc
2016-11-29 15:18:39 -08:00
10d0aea88c gradient for FlattenToVec
Summary: FlattenToVec was missing a gradient. It can use same gradient implementation as FlattenOp, i.e ResizeLike.

Reviewed By: kdub0

Differential Revision: D4241207

fbshipit-source-id: 6b1a60681fdce3c6f3139d0cd43b17798de2cbc9
2016-11-29 15:18:38 -08:00
2a95bd5239 Incremental MeanReducer segment Ops
Summary: Adding {Sorted|Unsorted}SegmentMean{Gradient|} operators.

Reviewed By: dzhulgakov

Differential Revision: D4185094

fbshipit-source-id: d4431e2a5a10a59570a491d63962668f248c0606
2016-11-29 15:18:38 -08:00
be1f3ed1d7 Add a snapshot test for Simon Layton
Summary: This is mainly for the OSS side checking.

Reviewed By: dzhulgakov

Differential Revision: D4238349

fbshipit-source-id: 061da3f721341c4a1249e1cc6c8c842fc505860f
2016-11-29 15:18:38 -08:00
e8b7ec1393 disable local update for sparse features
Summary:
With parameter server, sparse features are updated on the parameter server.
Local update for sparse features are disabled. But that logic is removed in
D4144922. This diff is to add this logic back in a slightly different way.

Previously, in trainer_example, I did that in a hacky way just avoid adding
sparse weight to model.params. It will still generate grad, but will not add
optimization operators. At the same time, it is always registered directly in
the sparse_mapping, so the parameter server is aware of this parameter.
But with the new change for ParameterInfo. I can not do it in that way anymore.
Because the param registry and params are bind together in ParameterInfo.

For dper, there is a option in dper model helper to disable all of the sparse
parameter optimizer.

To combine these two together, I directly changed the ModelHelperBase in this
diff. It is not quite ideal. It is better to do it in Layer. But to fix the old
one, this seems to be more reasonable place to cover both cases.

With this diff, there is no spike anymore. So probably this is the root cause
for the convergence issue we have seen in D4144922. It explains that why the
model can recover, which is because adagrad decays local learning rate and
local updates cause less change.

Reviewed By: dzhulgakov

Differential Revision: D4229684

fbshipit-source-id: da1241d43d7c52cbf13560f9bb83e09897d8d56f
2016-11-29 15:18:38 -08:00
5d0167c8e7 Example workflow for running disributed (syncsgd) imagenet training in Flow
Summary:
This diff introduces a simplified Imagenet trainer that uses data_parallel_model to parallellize training over GPUs and Nodes in synchronous manner. Flow's gang scheduling is used to launch the nodes, and data_parallel_model handles the synchronization among the gang members.

This example also uses the operator-per-epoch model where each epoch produces a checkpoint consumed by the followup epoch.

Reviewed By: salexspb

Differential Revision: D4223384

fbshipit-source-id: 8c2c73f4f6b2fdadb98511075ebbd8426c91eadb
2016-11-29 15:18:38 -08:00
6ebae91d24 multi-task learning: save model and evaluator
Summary:
This consists of a series of diffs for implementing Multi-task learning.
This diff is to
1. save model;
2. support MT learning in evaluator
3. add unittest.

model after merging (saved model): https://our.intern.facebook.com/intern/graphviz/?paste=56793140

Reviewed By: xianjiec

Differential Revision: D4123316

fbshipit-source-id: 225bf8616962ec08f4f1ef85729c1e94ba7c373a
2016-11-29 15:18:38 -08:00
365ca8da1c add sanity check that ops do not cross gpus
Summary: Debugging nets can be tiresome, so it is good if we can do some sanity checks. This adds a sanity check that all non-NCCL and non-Copy operators do not reference blobs that have different device scope than the operator. This check is only added to the data_parallel_model, so it should be safe. This check would had caught a subtle bugin prigoyal's training pipeline.

Reviewed By: dzhulgakov

Differential Revision: D4230444

fbshipit-source-id: 3d4a843162134a7a504053d95ff97a552e6b8a6d
2016-11-29 15:18:38 -08:00
a7df0e6724 Clone model net to avoid hard-coded inputs
Summary:
Previously DPER was quite broken - we couldn't change loaders on the fly because serialized model had blob names hard-coded, e.g. "nn_loader/dense". In fact, the tests worked only by accident as both trainer and evaluator used the same loader type.

This diff does the following:
1) when writing out model, remap input blobs to be 'inputs/<field_name>'
2) when loading eval model, remap them back to the current loader

This diff uses Net.input_schema() for convenience, in particular the schema format is implicitly serialized in input blobs names. From our discussion with Andrey this type of hardcoding is actually acceptible since the schema of HiveReader on python side is inferred via the same string-parsing procedure

It also modifies model saving a bit so that we don't pollute global namespace with shape_provider net.

Overall code in mlp.py is pretty terrible. But I'd leave refactoring to xianjiec as a part of Layers migration.

Reviewed By: xianjiec

Differential Revision: D4218902

fbshipit-source-id: 6cd19f0343ec1be6ddaa3581512e61879957749e
2016-11-29 15:18:38 -08:00
a597c7b167 implement sparse nn using layers
Summary:
- It's first prototype that includes simple unary test.
- will probably need to iterate based on it to include more arches that we see promising offline results

Differential Revision: D4208336

fbshipit-source-id: 5b2d2a5a0274a9dcad0fb169e43e78aa9d9a704d
2016-11-29 15:18:38 -08:00
9ea7947423 dot_production work for empty batch
Summary: used for dper delivery

Reviewed By: dzhulgakov

Differential Revision: D4229855

fbshipit-source-id: 9722b0bbb6c3c586b1864c939e8cb0535f8c5846
2016-11-29 15:18:38 -08:00
301ab97e41 Fix few more operators to handle empty batches correctly.
Summary:
If we go to prod some of the sparse features might be empty or for some reason
batch might be empty. It's a good idea to be sure that we can run empty
batches.

Reviewed By: dzhulgakov

Differential Revision: D4197297

fbshipit-source-id: 1a154ebf625d1a39fd15354a154cf100f525ae9a
2016-11-29 15:18:37 -08:00
f95757e66e Added internal logging to internal usage of caffe2
Summary: Added basic logger functionality.

Reviewed By: dzhulgakov

Differential Revision: D4150362

fbshipit-source-id: 2eb98ce72a5020fbfeec2ab8c5ff65b9a2128802
2016-11-29 15:18:37 -08:00
da7add3da8 Better threadpool sizing heuristics
Summary:
The old heuristic functioned badly on octa-core phones (e.g., the S6). Limiting the number of threads to 4 in the 8 core case seemed to give optimum performance. For 4 cores, 3 threads still seems to yield best performance, as does 2 threads for 2 cores in the iOS phones, though those cores are very different than the typical ARM cores in Android phones.

I figure at the limit, we should limit ourselves to half the cores available, especially since in a big.LITTLE configuration, only half the cores are likely to be big.

I need to get my hands on a deca-core phone or tablet to try out this heuristic, but I certainly figure that this will function better than what we had before (which would be 9 threads on a 10 core device).

Reviewed By: ajtulloch

Differential Revision: D4220341

fbshipit-source-id: 06fa7677789fcdbec03d98bb85a565f1d22099e1
2016-11-29 15:18:37 -08:00
6515772b1f Fix UBSAN issue for zero-sized memcpy
Summary: ...

Reviewed By: bwasti

Differential Revision: D4224805

fbshipit-source-id: 4fd07f9755b6b76978c05c9af0851019530a3c85
2016-11-29 15:18:37 -08:00
5eb836880d Add unittest.main() lines to test scripts under python/operator_test
Summary:
Needed by oss.

This is done by running the following line:

  find . -name "*_test.py" -exec sed -i '$ a \\nif __name__ == "__main__":\n    import unittest\n    unittest.main()' {} \;

Reviewed By: ajtulloch

Differential Revision: D4223848

fbshipit-source-id: ef4696e9701d45962134841165c53e76a2e19233
2016-11-29 15:18:37 -08:00
1e4b4fb4c4 Fix db_test under tsan
Summary:
It looks like there's some locking going on here, and so if
the Cursor outlives the DB (or vice-versa), we'll either deadlock or
unlock an unlocked mutex.

Reviewed By: dzhulgakov

Differential Revision: D4224727

fbshipit-source-id: 886401a9f2824f3168fb0b2fd4df6046369e5590
2016-11-29 15:18:37 -08:00
c1c92479bd check that numpy arrays are float32 when CUDA is used
Summary:
Recurrent developer-issue is that they pass numpy arrays with FeedBlob but forget that python float is actually double. Cuda ops in caffe2 don't allow doubles.
 Thus, I think we should reject incorrect types already at the FeedBlob() when device option is CUDA.

Added test.

Is this too strong?

Reviewed By: ajtulloch

Differential Revision: D4208153

fbshipit-source-id: 364b057a2a37b5d4b95de4e59faebdab724bb0ed
2016-11-29 15:18:37 -08:00
b9f1555b6a remove unused function from resnet50_trainer
Summary: Just noticed that I had duplicate code in the example imagenet trainer. Removed the function.

Differential Revision: D4223070

fbshipit-source-id: 443a9401bf7e425f7a3a13a44c9d0f7e21e72303
2016-11-29 15:18:37 -08:00
b77aa551a4 add missed comma
Summary: D4205610 missed a comma , causing unnecessary logspill with WeightedSum op

Reviewed By: Yangqing

Differential Revision: D4222806

fbshipit-source-id: ff17c20eae7a7168475f39cc227d3e8ab347288f
2016-11-29 15:18:37 -08:00
42279a610c use Pieter-MPI and fb.distributed
Summary:
Remove MPI and use fb.distributed rendezvous and Pieter's new Ops.

One now can pass a 'rendezvous' struct to data_parallel_model to initiate distributed SyncSGD. Provided rendezvoud implementation uses the kv-store handler of fb.distributed to disseminate information about other hosts. We can easily add other rendezvous, such as file-based, but that is topic of another diff.

Removing MPI allowed also simplifiying of Xray startup scripts, which are included in this diff.

When accepted, I will work on a simple example code so others can use this stuff as well. Also Flow implementation will be topic of next week.

Differential Revision: D4180012

fbshipit-source-id: 9e74f1fb43eaf7d4bb3e5ac6718d76bef2dfd731
2016-11-29 15:18:36 -08:00
122a89e3c5 Add FileStoreHandler
Summary:
The FileStoreHandler subclasses the abstract StoreHandler
class.

Operators expecting to work with a StoreHandler can now use the
filesystem as their backing store.

Reviewed By: Yangqing

Differential Revision: D4217711

fbshipit-source-id: fce60c99c4c505201dfee33ca0a4e8a35db00338
2016-11-29 15:18:36 -08:00
2790043421 Add the MKLDNN type to the tensor type strings and added proper docs.
Summary: TSIA

Reviewed By: dzhulgakov

Differential Revision: D4217541

fbshipit-source-id: f68d1aba9c20af0fb0aed2cc1b2099961f6fa7a4
2016-11-29 15:18:36 -08:00
0a42681f0c print more logs in qps metrics
Summary: since the LogScoreEstimator print # of examples after considering negative downsampling.

Reviewed By: kdub0

Differential Revision: D4218040

fbshipit-source-id: 30f54353042dcd85c945c2c911ba0b6d9c0b1540
2016-11-29 15:18:36 -08:00
6b437708ad caffe2/caffe2/operators/softmax_with_loss_op.cc: avoid shadowing warnings
Summary:
Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local
(and/or the stricter -Wshadow-local) options.  Note that these
are both less onerous than -Wshadow.
I plan to enable one of them for all of fbcode, soon.

Rename inner "idx" to "k".

Differential Revision: D4216556

fbshipit-source-id: 5ee48751efd07838db24f56390730718ea031772
2016-11-29 15:18:36 -08:00
2fbf774e99 make ReshapeOp work with CUDA
Summary:
It was not just enough to register ReshapeOp for CUDA, since it does memory copy to/from tensors. This happened in two places: when assigning shape from a shape blob and when outputing a shape tensor.

Also changed the resizeoptest to use CUDA when available (this test was done before hypothesis-tests, so I had to do this manually)

Differential Revision: D4217342

fbshipit-source-id: 61761bac015f3731cf480ccef2563e9c80e0f4aa
2016-11-29 15:18:36 -08:00
c3606dcb9a Fix integer/floating point conversion error when computing how much data to allocate
Summary: see title

Reviewed By: dzhulgakov

Differential Revision: D4215607

fbshipit-source-id: 172b8d743e5abe533998e884809aafb4c4bf1b1b
2016-11-29 15:18:36 -08:00
c48551409c Proper error message if passing NoneType value for kwargs
Summary:
I got a weird error about NoneType not being iterable which made me think
it was some error in the C2 core, whereas it was an error in my code.

Reviewed By: Yangqing

Differential Revision: D4192799

fbshipit-source-id: 0122f13e205c1c6a0766545f0ad6296228d3a3d9
2016-11-29 15:18:36 -08:00
949ce294ff fix race condition in text_file_reader.py
Summary:
This fixes a race condition in text_file_reader.py.

For example in `fbcode/caffe2/caffe2/fb/text/stats.py`, in `compute_meta`, we build an execution step `read` such as:

```
.
└── step_read
    ├── net_reader
    │   ├── op_TextFileReaderRead
    │   └── op_IsEmpty
    └── net_consume:n
        └── op_Tokenize
```

Note that in `workspace.cc`, we check should_stop between each _step_ and each _net_, not between _ops_

Let's say we have 2 workers, here is a faulty interleaving of threads:
- 1 executes TextFileReaderRead
- 2 executes TextFileReaderRead
- 1 executes IsEmpty and sets should_stop to False
- 2 executes IsEmpty and sets should_stop to True
- 1 checks should_stop before running net_consume:n
- 1 stops
- 2 checks should_stop before running net_consume:n
- 2 stops

That's an issue, because 1 did read data from the file but did not run the processing step (consume:n) for this data.

Reviewed By: dzhulgakov

Differential Revision: D4203729

fbshipit-source-id: eabd94ea995527ec52fa137a8b63c277f7e4dd96
2016-11-29 15:18:36 -08:00
0e298ec399 Expose MKLMemory to the Python Feed and Fetch interface, and misc changes
Summary:
This is #2 of a series of changes. It did the following:

(1) a few refactor of the MKL memory interface
(2) an initial MKLContext to deal with MKL specific computations
(3) Provide MKLMemory access in Python with the blob feeder/fetcher registration.

Reviewed By: dzhulgakov

Differential Revision: D4210123

fbshipit-source-id: adea1f1ffbd0b9ffdd55092676468c16bec08992
2016-11-29 15:18:36 -08:00
9fa26fcc32 position weighted embedding
Summary: Each sparse feature is a ID list. And usually the position of the id in the id list is meaningful. The earlier the id appears in the list, the more important. In this diff, we multiple each embedding with a weight, where the weight corresponds to the position. With this change, same ID appears on different position would have different norm/length/importance after aggregation. The firstX transformation in sigrid is a special case of this model where the weights before n are 1, and 0 after n, where n is the argument of firstX.

Reviewed By: xianjiec

Differential Revision: D4181251

fbshipit-source-id: 2a6f8b7240af445b6bd2052fd24c2d99f39ee7ff
2016-11-29 15:18:35 -08:00
b5613d7a3d report offending blob name when blob in wrong device scope
Summary:
Another recurrent problem is that some blob is in CPU scope while operator expects CUDA scope (or other way round).
The exception is only partially helpful, as it tells the operator but not the offending blob name. This diff adds the blob name
to the exception message, helping debug.

Reviewed By: prigoyal

Differential Revision: D4208584

fbshipit-source-id: 5aeac5c3efeed8d6c995bea166ed534855007945
2016-11-29 15:18:35 -08:00
c41f0d27c4 adding more things to the list of known operators in model_helper
Summary: This is so they don't generate spurious warning messages in the logs

Reviewed By: dzhulgakov

Differential Revision: D4205610

fbshipit-source-id: f764b51565430f4057898ab929372bc7943e0495
2016-11-29 15:18:35 -08:00
ddddfba1c0 Merge pull request #54 from peterhj/peterhj-staticlib
Add a static library target "staticlib" to the Makefile.
2016-11-28 09:15:39 -08:00
5765d608cc Add a static library target "staticlib" to the Makefile.
Rename the static library "libnccl_static.a" to disambiguate from the
dynamic libraries.
2016-11-24 11:31:03 -08:00
c2c515516b Remove irrelevant output from ncclReduce Fortran tests 2016-11-21 10:18:04 -08:00
9c18468fe2 Add Copyright header to Fortran bindings source files 2016-11-21 10:17:58 -08:00
9a02908b78 enable reshape op
Summary: TSIA

Differential Revision: D4208409

fbshipit-source-id: b5927af1d329639840f232e620cb0241cd88b03d
2016-11-18 17:03:21 -08:00
61c0bcf91d removed deprecated readme info
Summary: Removed deprecated information.

Reviewed By: Yangqing

Differential Revision: D4208320

fbshipit-source-id: 6906e9c56b9f0abf0582c2ba1bb8e6a5a9f89a84
2016-11-18 17:03:20 -08:00
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
5f2b32e45b Add Fortran bindings 2016-11-17 15:33:34 -08:00
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
d90206b3fd Merge pull request #46 from slayton58/bn_test_fix2
Fix BN in test phase
2016-10-26 10:58:09 -07:00
fe1e644052 Merge pull request #45 from slayton58/cudnn_find_ex
Move algo finding to cudnnFind*Ex
2016-10-26 10:26:01 -07:00
c88b1e0d25 Merge pull request #48 from slayton58/nccl_priority_streams
Add priority streams for NCCL ops
2016-10-26 10:25:13 -07:00
8def54e82b Fix BN in test phase 2016-10-19 08:20:11 -04:00
58a0ec4b4f Move algo finding to cudnnFind*Ex 2016-10-19 08:10:17 -04:00
2909bfaca0 Add priority streams for NCCL ops 2016-10-18 12:28:35 -04:00
534b9a1697 Bump to 1.3.1 2016-10-13 10:33:05 -07:00
b2781d0501 Fix primitives function prototype 2016-10-13 10:32:42 -07:00
bf7d1514f7 NVML (libwrap) : import the needed definitions 2016-10-13 10:28:59 -07:00
44509f9f91 fbsync: mostly lint changes, added mkl files 2016-10-11 22:45:06 -07:00
9201cdd029 nervana build files 2016-10-07 18:37:26 -07:00
f019672e0b Merge branch 'master' into fbsync 2016-10-07 16:42:13 -07:00
2535720654 fbsync 2016-10-07 15:47:52 -07:00
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00
8bb06c94be Improved allreduce segmentation for small sizes 2016-10-07 12:42:23 -07:00
8dec763659 Merge pull request #44 from slayton58/bn_test_fix_only
Fix BN for test phase
2016-10-07 11:13:37 -07:00
00c493864e Fix BN for test phase 2016-10-07 12:11:36 -04:00
b861d9a264 Merge branch 'fbsync' 2016-09-16 17:00:13 -07:00
ff8a5bb532 move third_party/gtest to use git submodules. As a result the folder name is now googletest 2016-09-16 16:56:34 -07:00
b91a6795ed move third_party/eigen3 to use git submodules. As a result the folder name is now eigen. 2016-09-16 16:25:22 -07:00
04f628446e move third_party/google/protobuf to use git submodules. 2016-09-16 16:13:27 -07:00
3d54e7b40e fbsync: changes to implement operator schema 2016-09-08 18:07:01 -07:00
0a09d09431 fbsync 2016-09-08 17:56:14 -07:00
221a82bdad more build fixing 2016-09-07 23:30:35 -07:00
50274cf49e fixing build - to be verified on a GPU machine 2016-09-06 18:09:51 -07:00
b23e51d467 chunky sync 2016-09-06 15:55:19 -07:00
862e35af25 template magic 2016-08-10 14:17:22 -07:00
1fc9830b59 make android build again 2016-08-10 14:00:11 -07:00
9c459be63a minor fix to make things build 2016-08-10 11:33:04 -07:00
05512d1e10 sync 2016-08-10 11:02:15 -07:00
91d5d740c3 Merge branch 'fbsync' of github.com:caffe2/caffe2 2016-08-09 16:22:45 -07:00
f9b7416efe context_gpu.h bugfix 2016-08-09 16:19:59 -07:00
c4eea1f2f6 Remove the no-shrink test as the critera is not guaranteed to be satisfied. Minor fix for others. 2016-08-04 23:34:37 -07:00
bfe76b2be4 cuda memory pool implementation cleaning: both cub and cnmem 2016-08-04 22:59:15 -07:00
2362571b23 add submodule cub 2016-08-04 15:17:15 -07:00
1ede7a7ff0 more build updates:
(1) nccl submodule, cnmem submodule
(2) mpi ops fallback test
(3) a bit more blob interface
(4) fixed tests
(5) caffe2.python.io -> caffe2.python.dataio to avoid name conflicts
(6) In the build system autogen __init__.py instead of having manual
rules just to copy over an empty __init__.py.
2016-08-02 23:28:23 -07:00
b2c2d0b70c Merge branch 'fbsync' of github.com:caffe2/caffe2 into fbsync 2016-08-01 20:59:52 -07:00
c15e45c9bb chunky sync again 2016-08-01 20:58:46 -07:00
a7af924919 Add libz to dependency list (needed by rocksdb) 2016-08-01 20:56:47 -07:00
4fffb7dc66 Add libz to dependency list (needed by rocksdb) 2016-08-01 16:03:28 -07:00
59256bac6d Add libz to dependency list (needed by rocksdb) 2016-08-01 15:37:10 -07:00
ad4641a2dc Merge pull request #38 from caffe2/fbsync
Fbsync
2016-08-01 14:32:04 -07:00
6fae5a043a Merge pull request #36 from songhan/fbsync
protected legacy_pad_, replace DeleteDropout with is_test=True
2016-07-29 12:56:35 -07:00
cc46464cf6 protected legacy_pad_, replace DeleteDropout with is_test=True 2016-07-29 11:44:55 -07:00
3c989347d8 caffe translator with added back legacy pooling support 2016-07-28 23:37:02 -07:00
5ab7676d20 code fix for oss 2016-07-28 16:14:44 -07:00
bcea409c82 sync 2016-07-28 15:06:43 -07:00
f01f2063dd bring up caffe.proto to master 2016-07-28 09:55:49 -07:00
b729f05c35 Android build improvements 2016-07-26 12:48:53 -07:00
d981e79e7d Add a simple script to help build android. 2016-07-26 10:40:37 -07:00
f09d2b2b35 changes to make c2 build. 2016-07-21 16:39:08 -07:00
09bed67e4f add untracked files 2016-07-21 11:26:41 -07:00
6463eebc7b chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
292211baa4 Merge pull request #34 from lukeyeager/fix-docker-zmq
Fix zeromq build in Dockerfile
2016-06-08 14:38:12 -07:00
e2d3145d72 Fix zeromq build in Dockerfile
The tarball download returns a 404
2016-06-03 17:32:24 -07:00
92100e4b66 Update README.md 2016-05-16 21:42:12 -07:00
a9aac859dd glog as default. 2016-05-15 23:25:21 -07:00
79c5275d75 A set of changes to make newest sync build.
(1) build file changes.
(2) removed data/ subfolder - anything involving datasets should probably
be tested separately.
(3) Some new functionalities.

TODOs:

(1) build files for contrib/
(2) cudnn 5.05 compatibility (currently supporting 5.04)
2016-05-15 23:04:32 -07:00
559053d3a8 chunky sync 2016-05-13 14:43:48 -07:00
8f24306aff bugfix 2016-03-17 21:39:29 -07:00
ff9a9a34bd bugfix 2016-03-17 21:12:03 -07:00
4ac4a49e58 bugfix 2016-03-17 21:04:27 -07:00
ecd9507fc1 minot changes 2016-03-17 20:50:14 -07:00
0190bd4fe1 cuda memorypool renaming 2016-03-17 20:48:49 -07:00
137b880aac cuda initialization.
This makes it callable multiple times but the
actual code only runs once. TODO: make it thread
safe. I am too lazy for now.
2016-03-15 12:52:05 -07:00
a2971e6a16 bugfix 2016-03-11 10:54:34 -08:00
fa34452625 eigen3 brew bugfix 2016-03-11 10:43:43 -08:00
4ae1bbbd7e bugfix 2016-03-11 10:30:16 -08:00
0521e1d672 notebook rewrite and grammar bugfix 2016-03-10 17:34:31 -08:00
cf7ca23fc1 make caffe2.python build 2016-03-08 16:48:19 -08:00
9ae880bb6f move pycaffe2 to caffe2.python 2016-03-08 15:45:30 -08:00
0747a4a7fd move a bunch of things 2016-03-08 15:15:19 -08:00
9e5c795c11 third party clean 2016-03-08 14:39:31 -08:00
ffec31ea07 pycaffe2 c++ extension: py3
So I tried to make things compilable in python3 but a lot of the actual
functionalities are yet to be verified. Since I am not using py3 for a
short while and protobuf 2.6.1 does not work with py3 (among a bunch of
others), I'll put this as a future todo item.
2016-03-07 22:08:53 -08:00
1b5da38e29 minor fix 2016-03-07 13:47:36 -08:00
e7c016cde6 py3 reformat 2016-03-07 08:53:08 -08:00
46de5451ed BREW modifications 2016-03-05 08:00:42 -08:00
176de750c8 pycaffe2 minor fix 2016-03-05 08:00:20 -08:00
b589d831b8 cudnn v4 interface change. 2016-03-05 08:00:08 -08:00
05ead5f76f Bugfix for logging 2016-03-04 09:47:20 -08:00
0d5d16b3e6 race condition fix: since Memcpy is now async, we will make sure that the python interface syncs before returning the content. Otherwise it makes things flaky. 2016-02-02 16:05:04 -08:00
50874dc746 relu and pool wip 2016-02-01 14:08:10 -08:00
1740974347 average pooling wrapper: without this the NHWC path would throw an error as the order is not passed along. 2016-01-22 09:31:49 -08:00
5a94ee6b64 Allow one to set the blas backend, while optionally choosing to use
Eigen for the whole numerical computation (for example, on a platform
where there is no optimized BLAS libraries present, or Eigen is already
the fastest numerical library existing).

The paths I have tested is Eigen and atlas. Have not tested MKL yet.
2016-01-20 17:05:21 -08:00
78aa266770 Fix 2016-01-19 14:49:48 -08:00
d84545c5fb fp16: allow one to override. 2016-01-19 14:39:26 -08:00
5f2d7ba963 misc: experimental cuda elementwise rtc, softmax fp16 2016-01-19 12:49:36 -08:00
d244ca9052 relu fp16 fix 2016-01-13 22:12:49 -08:00
fa59b90c72 misc updates 2016-01-13 21:00:56 -08:00
a05782f025 fix 2016-01-12 15:59:02 -08:00
d08880e61a more RTC experiments 2016-01-12 15:44:15 -08:00
fe78d1a445 quick rtc try 2016-01-11 20:21:41 -08:00
5bc33a4f7a Minor fix 2016-01-11 20:10:02 -08:00
64e0d3a29a misc updates, mainly relu, to test fp16 2016-01-07 20:56:11 +00:00
e2b9172b4c print cudnn version 2016-01-07 18:49:41 +00:00
1c020d257b bugfix 2016-01-07 18:48:30 +00:00
a5a75e8005 some changes for TX1 benchmark 2016-01-05 20:20:50 +00:00
66b13f4062 A bunch of cpu stuff:
- Bring up eigen 3.3 beta 1. Slight performance improvements.
- default avx, avx2 and fma compilation.
2016-01-05 09:56:34 -08:00
ba62b4b493 minor changes to the build system as well as a cpu benchmark. 2016-01-05 09:56:31 -08:00
8bcfb30d97 make android 2016-01-05 09:55:22 -08:00
809d54ee50 convnet benchmark minor change 2016-01-05 09:55:22 -08:00
8c1bbaa2ab some fill ops that are not tested. 2016-01-05 09:55:22 -08:00
6cb2072422 cudnn conv op backward compatibility back to v2 2016-01-05 09:55:21 -08:00
778a1f6956 speed benchmark 2016-01-05 09:55:21 -08:00
05eda208a5 Last commit for the day. With all the previous changes this should give an exact reference speed that TensorFlow with CuDNN3 should achieve in the end. 2016-01-05 09:55:21 -08:00
896e8e5274 pooling backward cudnn, and constant for kOne and kZero. 2016-01-05 09:55:21 -08:00
f8585bbf62 cudnn pool op. 2016-01-05 09:55:21 -08:00
664bdf83d7 Pooling refactor so we can do a proper cudnn benchmark. 2016-01-05 09:55:21 -08:00
288f350899 math_gpu.cu bugfix 2016-01-05 09:55:21 -08:00
ebd6c9fab8 muji bugfix with ngpu=4 2016-01-05 09:55:21 -08:00
55cced894d Some untested half float stuff for benchmarking. 2016-01-05 09:49:55 -08:00
8d4683434b convnet benchmark: make it consistent with TF's model. 2015-12-17 11:25:51 -08:00
b7c3b48469 copy matrix can be done with cudamemcpy. 2015-12-17 10:22:02 -08:00
b10ee24fc3 conv op: backward exhaustive mode too. This does not seem to help much, suggesting that cudaGetConvolution*Algo is already doing a very good job. Verified with googlenet. 2015-12-17 10:21:16 -08:00
d79cfb4ae7 exhaustive search for cudnn 2015-12-15 22:21:11 -08:00
61c114971b fast path for copymatrix 2015-12-15 22:21:11 -08:00
05e3207e26 fast path for copymatrix 2015-12-15 21:25:53 -08:00
cc9323793e add relu cudnn code 2015-12-15 20:43:34 -08:00
4f2530d8ce expose benchmark code to python 2015-12-15 20:42:54 -08:00
6b27cabf17 net benchmark code 2015-12-15 20:42:22 -08:00
cf8ffe215f minor tuning 2015-12-15 20:41:58 -08:00
20ccca5b67 RTTI to true in default for the main model. 2015-12-15 11:01:09 -08:00
f714ad0a70 number of blocks now makes more sense. 2015-12-15 10:46:50 -08:00
3b0cc79465 context gpu: better error catching 2015-12-14 13:59:28 -08:00
73f3daf736 minor bugfix for workspace 2015-12-13 08:37:36 -08:00
bfae070de1 minor bugfix for net 2015-12-13 08:37:01 -08:00
359f7685f8 halfway into timing test. 2015-12-11 11:01:40 -08:00
03c777db72 boolean for has_gpu_support 2015-12-10 15:06:57 -08:00
7bdc8a6c19 Pycaffe2: removed the clunky gpu support hack.
Now, when one builds pycaffe2, if cuda is present, we will always build
pycaffe2 with gpu support.
2015-12-10 15:06:57 -08:00
becf9e85c1 remove no longer needed build_env_android.py. 2015-12-10 15:06:57 -08:00
82696ebc5d Merge pull request #9 from Yangqing/master
a script to test zeromq db throughput.
2015-12-09 15:36:39 -08:00
ae1ebd0f19 a script to test zeromq db throughput. 2015-12-09 15:15:06 -08:00
77541ffe14 flags relaxation, or tightening? 2015-12-07 20:48:57 -08:00
ceb4cde74a average pooling format change to fit the cudnn interface 2015-12-06 15:56:29 -08:00
6bfb30047e deprecate legacy pooling 2015-12-06 11:28:00 -08:00
20dbbbbb28 android: use full proto in default 2015-12-06 11:26:30 -08:00
9022e4f499 pull protobuf to master 2015-12-05 18:34:48 -08:00
05465783c6 optionally use protobuf lite 2015-12-05 16:15:00 -08:00
3d7cb201a3 misc changes to reduce binary size. 2015-12-04 21:31:23 -08:00
4eb486bd34 misc update to reduce binary size. Removed zmq.hpp 2015-12-03 21:28:55 -08:00
ff04fe8b1b merge 2015-12-02 21:41:56 -08:00
1a4ea7c8fc misc updates 2015-12-02 21:01:55 -08:00
b64429bbc6 Merge branch 'dev' of https://github.com/Yangqing/caffe2 into dev
Conflicts:
	caffe2/operators/spatial_batch_norm_op_cudnn.cc
2015-12-02 20:57:36 -08:00
25647f8c47 more test for tf benchmark purposes. 2015-12-02 16:55:51 -08:00
01b45fd052 backward support to cudnn R2 for TensorFlow benchmark references 2015-12-02 15:12:04 -08:00
acc16645d3 temp hack. Will rewrite the build script later. 2015-12-02 10:06:15 -08:00
3a4d4285f2 Added more benchmarks. 2015-12-02 10:04:00 -08:00
7d87fe788f alexnet benchmark code using cudnn: this should give a reference speed that TensorFlow should achieve after tuning. With R4 currently we have 29.5ms fwd / 93.4ms bwd. 2015-12-01 17:17:22 -08:00
1499b87e56 cudnn spatial bn: optional compilation instead of throwing error 2015-12-01 14:20:28 -08:00
5ba54180f5 various updates 2015-11-28 13:12:43 -08:00
1b7c5acbd8 halfway. Prepare to revert proto3 to proto2 2015-11-28 11:02:47 -08:00
fcd5f8fbf0 move to the new build scripts 2015-11-27 21:31:09 -08:00
3dcb868411 misc update 2015-11-27 21:28:03 -08:00
85c2eaa303 Halfway into refactoring the build system 2015-11-27 19:06:32 -08:00
63b010115b minor changes 2015-11-25 10:16:49 -08:00
a71667859f I thought I removed this. Maybe on another machine? 2015-11-25 10:16:37 -08:00
7cfa9f378b cnn default order. 2015-11-11 14:25:02 -08:00
85f2fc65b7 well. 2015-11-11 14:24:45 -08:00
92790cf6b3 Spatial batch norm; currently just based on cudnn. 2015-11-11 14:23:53 -08:00
5c915a0321 Some naming changes. 2015-11-10 23:11:06 -08:00
d577f9b95d Code sugar for simpler gradient definition. 2015-11-10 23:11:05 -08:00
63bd3ce182 sigmoid 2015-11-10 23:11:05 -08:00
d582c395dc tanh 2015-11-10 23:11:05 -08:00
a3dcd9250a bugfix 2015-11-10 23:11:05 -08:00
48d87711ed bugfix on master 2015-11-10 23:10:21 -08:00
a74d606df7 A collection of changes:
(1) Registry now uses std::function for more flexible use cases.
(2) dropout adds an "is_test" keyword.
(3) Making all gradient registered via C++. Python still provides gradient wrapper.

TODO item is to make the autograd SSA in C++ if possible. Problem is if we want to dynamically
register python gradients we will be sort of screwed because in c++ things are registered
via static variables.
2015-11-07 16:12:18 -08:00
f5393b4c78 Merge pull request #20 from Yangqing/tensor_notype
Some big rewrites now that they are stabilized.
2015-10-31 12:53:29 -07:00
457ef79c70 elegant fetchblob 2015-10-31 09:55:29 -07:00
71e9932148 half float conversion 2015-10-31 09:50:15 -07:00
b70f46f958 minor fix 2015-10-31 09:50:02 -07:00
625e19acae Float16 for convolution 2015-10-31 09:49:45 -07:00
a51a398fec Revert "minor fix"
This reverts commit 88e5438cd6784ebce0053bbbc54795ba8f99b9eb.
2015-10-30 15:09:42 -07:00
9e48fd2e8e utility op in-place opt-in 2015-10-30 14:29:57 -07:00
d167ae5399 cudnn race condition fix 2015-10-30 14:29:29 -07:00
6847291ea8 minor fix 2015-10-30 14:22:53 -07:00
5cf83e57f2 cudnn refactor so we can do easier benchmark check. Also some minor bug fix. 2015-10-29 12:40:39 -07:00
141d122db3 minor bugfix 2015-10-29 12:37:09 -07:00
0d18ed31dd minor bugfix 2015-10-29 10:27:32 -07:00
80a70b635f Two main changes:
(1) Loss: do not coerce a gradient output. Although it may be numerically more efficient to do so, it makes the definition of a loss kind of funny if one does not really want to run backward pass.
(2) Autodifferentiation: allow more explicit in-place check, in-place is now opt-in, and implemented a simple SSA/IR gradient generation scheme. Also added some core gradient tests.

Misc bugfixes as well.
2015-10-28 23:15:17 -07:00
98c5b86ef7 A few changes:
(1) cudnn for conv
(2) cublas: after going through the work I feel it's beter to use HOST pointer mode, so changed it.
(3) storage order: despite that googlenet and multibox uses NHWC, it seems better to be still using
    NCHW as default to be consistent with caffe and cudnn; moved to NCHW as default.
2015-10-21 22:37:11 -07:00
d734ddc196 Adding optional Eigen code. Added a switch USE_SYSTEM_EIGEN in Env. Misc changes. 2015-10-18 16:55:24 -07:00
648d1b101a A consolidation of a couple random weekend work.
(1) various bugfixes.
(2) Tensor is now a class independent from its data type. This allows us
    to write easier type-independent operators.
(3) code convention changes a bit: dtype -> T, Tensor<*Context> -> Tensor* alias.
(4) ParallelNet -> DAGNet to be more consistent with what it does.
(5) Caffe's own flags library instead of gflags.
(6) Caffe's own logging library instead of glog, but glog can be chosen with
    compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros
    like CHECK, DCHECK now have prefix CAFFE_, and LOG(*) now becomes
    CAFFE_LOG_*.
(7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF
    in build_env.py.
2015-10-11 23:14:06 -07:00
064fef1313 trying to cover as much mpi cases as possible 2015-09-15 22:06:37 -07:00
5b9584c227 carpet bombing 2015-09-15 21:30:23 -07:00
42d310afdd Update README.md
installation link.
2015-09-12 20:26:08 -07:00
5b8ae52e4b cudnn note. 2015-09-12 16:55:24 -07:00
87bdbe0e8e Hickery Dickery Docker.
Also started adding some docs for installation
2015-09-12 10:57:44 -07:00
0b66c01462 (1) blob debugstring
(2) cnn bugfix and enhancement
(3) device checker fix
(4) suppress prints from caffe2_python
(5) moar tests!
2015-09-12 10:37:40 -07:00
8490282cd1 ‘did I…?’ ‘No.’ ‘should I…?’ ‘Yes.’ 2015-09-09 20:38:10 -07:00
d07549bed2 half-finished cnn wrapper, etc. 2015-09-09 20:33:34 -07:00
d4336af327 caffe_translator minor change 2015-09-06 16:33:59 -07:00
d9af6fc7f2 Now we need to add GlobalInit() before mujitest. 2015-09-06 15:57:07 -07:00
53868133e1 lmdb: after commit, txn is already deleted so no need to abort 2015-09-06 22:34:22 +00:00
d72cfcebaf fixes to allow more consistent build tests 2015-09-06 22:34:22 +00:00
4f33daf308 workspace: cannot call GlobalInit with sys.argv
because that will cause e.g. python to fail. Need
a better way, currently just disabling it.
2015-09-06 13:35:20 -07:00
7517c2898f translator and misc fixes for legacy group convolution, sigh. 2015-09-06 13:34:46 -07:00
1164b9a347 mujitest bugfix 2015-09-06 08:59:12 -07:00
821eac3e7c lint 2015-09-06 08:59:12 -07:00
6e5e9743c3 muji fix 2015-09-06 08:59:12 -07:00
8198cb135d pycaffe2 finetune 2015-09-06 08:59:12 -07:00
a117c5164a workspace: set globalinit 2015-09-06 08:59:11 -07:00
bc70f17a4f no more gflags hack headers. 2015-09-06 08:59:11 -07:00
e505c98996 no more gflags_namespace.h header 2015-09-06 08:59:11 -07:00
7583822af8 muji 2015-09-06 08:59:11 -07:00
c18d756c49 workspace bugfix 2015-09-06 08:59:11 -07:00
92e2cddd62 add some python cuda capability 2015-09-06 08:59:11 -07:00
91d8ce4f44 python uses global init 2015-09-06 08:59:04 -07:00
d2ff13d332 put a peer access pattern function to caffe2. 2015-09-06 08:59:04 -07:00
f2fde73447 init test bug fix - forgot to commit in the previous one 2015-09-06 08:59:03 -07:00
26591c8ac7 easy selection of gpu ids for cuda context. 2015-09-06 08:59:03 -07:00
baffb9c503 make caffe2_gtest also uses globalinit. Not allowing init to run twice. 2015-09-06 08:59:03 -07:00
ec069cb3ea Use a global init function: it seems that with the multiple components optionally linked in, it is best to just enable a registering mechanism for inits. 2015-09-06 08:59:03 -07:00
de55e4e77c changes to ensure a more robust build 2015-09-06 05:49:10 +00:00
2ddea70a08 remove dependency to google profiler 2015-09-03 20:55:55 -07:00
ecd46d5ea0 A memory pool implementation based on cnmem. Added cnmem license to LICENSE. 2015-09-03 20:55:50 -07:00
5325bd5049 rename files so things appear cleaner 2015-09-03 20:55:45 -07:00
4f4aa1f205 clip operator, not tested. 2015-08-28 16:33:10 -07:00
a57de4ece7 clean pycaffe namespace snafu. 2015-08-28 14:02:53 -07:00
a12a471b2d suppress compiler warning. 2015-08-28 14:02:53 -07:00
f528f46c64 move LICENSE.caffe into LICENSE, and added related correct attributions. 2015-08-28 14:02:53 -07:00
ea0c7afa49 Delete db.cc
db.cc has been moved to caffe2/core and this is no longer used.
2015-08-27 10:45:41 -07:00
561bf8eb1b Merge branch 'master' of https://github.com/Yangqing/caffe2 2015-08-15 07:20:19 -07:00
7d021a0346 context change and dropout bugfix 2015-08-15 07:17:07 -07:00
d6bebc4824 lrn bugfix - for specific cuda devices a single if is not enough, need to do kernel loop. 2015-08-15 07:15:14 -07:00
eac3b5bd28 Update README.md 2015-08-09 18:46:52 -07:00
30fb5b94ac Update README.md 2015-08-09 18:46:26 -07:00
dad0608e75 pycaffe2 minor fix 2015-08-08 16:27:38 -07:00
10ffe1132f image input: support caffe datum format 2015-08-08 13:04:02 -07:00
b4656c77b3 prefetch op bugfix 2015-08-08 13:01:12 -07:00
60e94b5247 Update LICENSE 2015-08-07 21:43:29 -07:00
a956aa90e2 brewery pool *= 2 2015-08-06 13:40:03 -07:00
4b32534e84 bugfix for dropout and filler 2015-08-06 13:39:44 -07:00
32dc580c43 utils: relax bool 2015-08-06 13:39:01 -07:00
43afd1bdeb Change the strategy of dealing with gradients of shared parameters. 2015-08-06 13:37:09 -07:00
127684610f utils bugfix 2015-07-29 09:21:02 -07:00
a07c255d16 Some utility function changes 2015-07-29 09:21:02 -07:00
45355ae79e Merge pull request #10 from jeffdonahue/switch-copy-arg-order
change arg order of Copy/Memcpy to follow inputs-then-outputs convention
2015-07-28 15:28:33 -07:00
d829950eff change arg order of Copy/Memcpy to follow inputs-then-outputs convention
instead of C memcpy order -- from (dst, src, n) to (n, src, dst)
2015-07-27 21:19:32 -07:00
5bdfe67f89 Merge pull request #9 from jeffdonahue/remove-failing-test
remove failing repeated arg death test after #7
2015-07-27 18:27:42 -07:00
b856798020 disable CannotAccessRepeatedParameterWithWrongType test broken by
removal of check (PR #7)
2015-07-27 18:28:44 -07:00
75b1f2f868 Merge pull request #8 from jeffdonahue/tensor-init-values-const
Tensor constructor: values arg is const
2015-07-27 18:14:16 -07:00
b996b8dfd8 Merge pull request #7 from jeffdonahue/empty-tensor-is-scalar
allow Tensor with empty dims (a scalar)
2015-07-27 18:13:54 -07:00
091aabeaf0 operator.cc: remove check for empty repeated argument (to allow empty
shapes and other use cases)
2015-07-27 18:09:58 -07:00
4aad861f9f allow Tensor with empty dims (a scalar)
use in loss functions for scalar loss output
2015-07-27 18:08:06 -07:00
7b481f5a8f Tensor constructor: values arg is const 2015-07-27 13:38:54 -07:00
2adcb8732a Merge pull request #5 from jeffdonahue/add-to-build-env-dirs
add system/anaconda dirs to build_env.py, filter non-existent dirs
2015-07-27 13:20:27 -07:00
a139a1bf6a build_env.py: filter non-existent dirs from INCLUDES, LIBDIRS 2015-07-27 11:42:32 -07:00
2d30d89cb1 build_env.py: add python library dir to LIBDIRS 2015-07-27 11:42:05 -07:00
6440362c08 build_env.py: add /usr/include, /usr/lib to the default include, library
dirs
2015-07-26 19:07:45 -07:00
7f5ee34b9a add back gflags dependency 2015-07-23 20:40:39 -07:00
a6d20495c2 [gflags] sort out the gflags namespace issue. 2015-07-23 20:12:35 -07:00
532670cce0 [cuda] add all gencode archs 2015-07-23 19:00:04 -07:00
c3ba30a537 blob templating 2015-07-22 20:59:22 -07:00
aaead5d6a5 reorg mpi_ops 2015-07-20 21:24:05 -07:00
a76e7bb760 core_gradients change that goes with conv gradient change 2015-07-19 20:16:38 -07:00
cf5a9f62b0 bugfix 2015-07-19 20:16:15 -07:00
966a743f8c legacy names due to the plural->singular change 2015-07-19 20:16:10 -07:00
90affff039 broadcast op: it is an in-place op with both input and output set 2015-07-19 20:15:50 -07:00
745d8ed969 run_plan_mpi: check the returned state 2015-07-19 20:14:43 -07:00
d4eab84548 conv_op: during backward, bias is not needed. 2015-07-19 20:14:22 -07:00
691986ec21 Add an execution chain option to force operator orders. 2015-07-19 20:14:05 -07:00
a5254881f2 Some more MPI examples 2015-07-19 14:21:44 -07:00
571ee16b44 minor change 2015-07-19 09:13:07 -07:00
cfab4ed865 zmq feeder bugfix 2015-07-18 18:50:45 -07:00
43aaadbef4 zmq feeder: catch error when setting up the socket. 2015-07-18 17:38:36 -07:00
ecf0dceef6 zmqdb: pass key as well. 2015-07-18 17:26:29 -07:00
05ba5b0527 Use c++ to do zmqdb, and added a simple zmq feeder example. 2015-07-18 14:56:34 -07:00
47c70a43b4 (1) minidb bugfix
(2) blob serialization comments
(3) cudnn: putting it under a separate device name
    so we can explicitly choose cudnn instead of
    having CUDA device prioritizing it.
(4) note that mint is not available with ipython
    due to zeromq conflict
(5) db_throughput utility
(6) added gprofiler
2015-07-18 07:23:09 -07:00
c5166e578c Several changes:
[misc] Update license and readme.

[binary] some enhancement to move over caffe databases.

[python] added an alexnet2 example notebook.
2015-07-08 19:42:02 -07:00
59e1ad7e77 Update license and readme. 2015-07-06 22:13:14 -07:00
85a40a0b96 zmqdb: make clear error message that zmq 3+ is required. 2015-07-06 22:01:48 -07:00
f15492f303 make things compile for gcc 2015-07-06 22:00:02 -07:00
60ff1b4802 Merge pull request #2 from longjon/master
Bring in Jon's changes.
2015-07-06 21:19:11 -07:00
16c253e62e Some non-trivial refactoring:
(1) added blob serialization.
(2) registry can now use key types other than string.
(3) changed load_save_op so they interface with a db.
(4) change sgd iter op: it does increments so we can resume an iter.
(5) mnist linear classifier tests snapshot functionality.
(6) added protodb which is a small wrapper over TensorProtos.
2015-07-06 21:17:18 -07:00
dc41b23e41 [bugfix] static initialization order fiasco. 2015-07-06 21:17:18 -07:00
97f4b9f3e7 [bugfix] missing dependency 2015-07-06 21:17:18 -07:00
9b4fd2e77e workspace: create root folder if not exist. 2015-07-06 21:17:18 -07:00
e078bf4e81 [op] iter_op: instead of producing an int, produces it in a wrapped tensor. 2015-07-06 21:17:18 -07:00
2a3fda4d60 [interface] OperatorBase::OutputIsType 2015-07-06 21:17:15 -07:00
01055c20f0 Merge pull request #1 from xiaoyunwu/master
minor format issue
2015-07-05 22:16:26 -07:00
8ba95403fd minor format issue 2015-07-06 11:37:31 +08:00
036229c889 [style] Finishing name changes for the rest of the fields in the protobuf. 2015-07-01 18:16:43 -07:00
c1077400c9 [style] Massive name change from plural to singular. This one changes operator’s “inputs” and “outputs” to “input” and “output”, and "args" to "arg". 2015-07-01 15:31:01 -07:00
df6fd55cce [minor] net.cc: in parallelnet, if the user forgot to set a num_workers, warn instead of quitting. 2015-07-01 14:09:43 -07:00
241cad91f2 [LegacySupport] Fixed a bug in the caffe pooling case: pad_tail is changed on first run so we can only use pad_head for the legacy padding value. 2015-07-01 14:08:57 -07:00
cf88235bb4 [pycaffe2] net_drawer: do not exit too loud if pydot is not present. 2015-07-01 14:00:01 -07:00
e94be296b7 Fix some bug in the old code: the net device option overwite used to not work. 2015-07-01 13:38:25 -07:00
8834e3eb13 Adding <memory> to the header. Clang is fine with missing this but gcc complains. 2015-07-01 12:37:24 -07:00
8bedbde1c1 More flag changes… and adding back run_plan_mpi 2015-07-01 12:35:34 -07:00
314696a262 explicitly make end_to_end_test depend on core_gpu 2015-07-01 12:21:58 -07:00
fdf6066e45 namespace google -> gflags for gflags functions. 2015-07-01 11:22:23 -07:00
2807aac523 enable optional_deps so that any non-crucial failures will not break the whole system. 2015-06-30 20:41:41 -07:00
2abd5e263e GoogleNet adaption - added yet another legacy padding support. 2015-06-30 19:40:05 -07:00
aadac9c1fa [binaries] add missing header 2015-06-30 18:51:53 -07:00
d8e3ce8ef4 [build] flush output from subprocesses 2015-06-30 18:51:02 -07:00
1e7730800f bottlefeeding. 2015-06-30 09:26:56 -07:00
dcb921f7ee Caffe translator example notebook, and some nicety-type changes that accompany it. 2015-06-29 17:12:44 -07:00
e5e74a5230 fix an embarrassing bug introduced when moving over the padding change. 2015-06-28 18:39:29 -07:00
9a19430a39 Simplify the data sharing mechanism, using std::shared_ptr instead of home-brew code.
Also cleaned notebook notes a little bit.
2015-06-25 21:23:23 -07:00
2ed1077a83 A clean init for Caffe2, removing my earlier hacky
commits.
2015-06-25 16:26:01 -07:00
5360 changed files with 833771 additions and 102707 deletions

974
.circleci/config.yml Normal file
View File

@ -0,0 +1,974 @@
docker_config_defaults: &docker_config_defaults
user: jenkins
aws_auth:
# This IAM user only allows read-only access to ECR
aws_access_key_id: AKIAJ2J6FIG5OSZTQ3IA
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}
# NOTE: We only perform the merge in build step and not in test step, because
# all source files will be shared from build to test
merge_pull_request_onto_master: &merge_pull_request_onto_master
name: Merge Onto Master
no_output_timeout: "10h"
command: |
if [[ "${CIRCLE_BRANCH}" != "master" ]]; then
git config --global user.email "circleci.ossci@gmail.com"
git config --global user.name "CircleCI"
git config remote.origin.url https://github.com/pytorch/pytorch.git
git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=50 --quiet
export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
export GIT_COMMIT=${CIRCLE_SHA1}
echo "GIT_COMMIT: " ${GIT_COMMIT}
git checkout -f ${GIT_COMMIT}
git reset --hard ${GIT_COMMIT}
git merge --no-edit --no-ff ${GIT_MERGE_TARGET}
fi
pytorch_linux_cpu_build_test_defaults: &pytorch_linux_cpu_build_test_defaults
resource_class: large
working_directory: /var/lib/jenkins/workspace
steps:
- checkout
- run:
<<: *merge_pull_request_onto_master
- run:
name: Build
no_output_timeout: "10h"
command: |
export IN_CIRCLECI=1
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`
export MEMORY_LIMIT_MAX_JOBS=8 # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}
git submodule sync && git submodule update --init
.jenkins/pytorch/build.sh
.jenkins/pytorch/test.sh
pytorch_linux_build_defaults: &pytorch_linux_build_defaults
resource_class: large
working_directory: /var/lib/jenkins/workspace
steps:
- checkout
- run:
<<: *merge_pull_request_onto_master
- run:
name: Build
no_output_timeout: "10h"
command: |
export IN_CIRCLECI=1
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
if [ -n "${CUDA_VERSION}" ]; then
export TORCH_CUDA_ARCH_LIST=5.2
fi
export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`
export MEMORY_LIMIT_MAX_JOBS=8 # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}
git submodule sync && git submodule update --init
.jenkins/pytorch/build.sh
export PYTORCH_CI_ENV_DIR=/var/lib/jenkins/pytorch-ci-env
mkdir -p ${PYTORCH_CI_ENV_DIR}
cp -r /var/lib/jenkins/workspace ${PYTORCH_CI_ENV_DIR}/build_workspace # This copies all source files from build step to the next step
cp -r /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch ${PYTORCH_CI_ENV_DIR}/torch
cp -r build/bin ${PYTORCH_CI_ENV_DIR}/cpp_test_bin
if [ -d "../cpp-build" ]; then
cp -r ../cpp-build ${PYTORCH_CI_ENV_DIR}/cpp-build
fi
- persist_to_workspace:
root: /var/lib/jenkins/pytorch-ci-env
paths:
- "*"
pytorch_linux_test_defaults: &pytorch_linux_test_defaults
machine:
image: default
steps:
- run:
name: Prepare workspace
command: |
sudo mkdir -p /opt/workspace
sudo chmod -R 777 /opt/workspace
- attach_workspace:
at: /opt/workspace
- run:
name: Build
no_output_timeout: "10h"
command: |
set -x
sudo pip install awscli
if [ -n "${CUDA_VERSION}" ]; then
curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
fi
sudo apt-get update
sudo apt-get remove linux-image-generic linux-headers-generic linux-generic
sudo apt-get install linux-headers-$(uname -r)
sudo apt-get install linux-image-generic
if [ -n "${CUDA_VERSION}" ]; then
wget 'https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-396.26.run'
sudo /bin/bash ./NVIDIA-Linux-x86_64-396.26.run -s --no-drm
sudo apt-get install -y nvidia-docker2
fi
sudo pkill -SIGHUP dockerd
if [ -n "${CUDA_VERSION}" ]; then
nvidia-smi
fi
# This IAM user only allows read-only access to ECR
export AWS_ACCESS_KEY_ID=AKIAJ2J6FIG5OSZTQ3IA
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}
eval $(aws ecr get-login --region us-east-1 --no-include-email)
docker pull ${DOCKER_IMAGE}
if [ -n "${CUDA_VERSION}" ]; then
id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
else
id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
fi
pwd
cp -r /opt/workspace/build_workspace/. /home/circleci/project # This copies all source files from build step to the current step
echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env
echo "declare -x PYTHON_VERSION=${PYTHON_VERSION}" >> /home/circleci/project/env
echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env
# This IAM user allows write access to S3 bucket for sccache
echo "declare -x AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}" >> /home/circleci/project/env
mkdir -p /home/circleci/project/build
cp -r /opt/workspace/cpp_test_bin /home/circleci/project/build/bin
docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"
echo "mkdir -p /opt/conda/lib/python${PYTHON_VERSION}/site-packages" | docker exec -u jenkins -i "$id" bash
docker cp "/opt/workspace/torch" "$id:/opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch"
if [ -d "/opt/workspace/cpp-build" ]; then
docker cp "/opt/workspace/cpp-build" "$id:/var/lib/jenkins/cpp-build"
fi
if [ -n "${MULTI_GPU}" ]; then
(echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch && cd workspace && .jenkins/pytorch/multigpu-test.sh') | docker exec -u jenkins -i "$id" bash
else
(echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch && cd workspace && .jenkins/pytorch/test.sh') | docker exec -u jenkins -i "$id" bash
fi
caffe2_linux_build_defaults: &caffe2_linux_build_defaults
resource_class: large
working_directory: /var/lib/jenkins/workspace
steps:
- checkout
- run:
<<: *merge_pull_request_onto_master
- run:
name: Build
no_output_timeout: "10h"
command: |
export IN_CIRCLECI=1
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}
export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`
export MEMORY_LIMIT_MAX_JOBS=8 # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
set -ex
# Need to checkout fetch PRs for onnxbot tracking PRs
git submodule update --init third_party/onnx || true
cd third_party/onnx && git fetch --tags --progress origin +refs/pull/*:refs/remotes/origin/pr/* && cd -
# Reinitialize submodules
git submodule sync && git submodule update --init --recursive
# Ensure jenkins can write to the ccache root dir.
sudo chown jenkins:jenkins "${HOME}/.ccache"
# Make ccache log to the workspace, so we can archive it after the build
mkdir -p build
ccache -o log_file=$PWD/build/ccache.log
# Configure additional cmake arguments
cmake_args=()
cmake_args+=("$CMAKE_ARGS")
if [[ $BUILD_ENVIRONMENT == *aten* ]]; then
cmake_args+=("-DBUILD_ATEN=ON")
fi
# conda must be added to the path for Anaconda builds (this location must be
# the same as that in install_anaconda.sh used to build the docker image)
if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
export PATH=/opt/conda/bin:$PATH
sudo chown -R jenkins:jenkins '/opt/conda'
fi
# Build
if test -x ".jenkins/caffe2/build.sh"; then
./.jenkins/caffe2/build.sh ${cmake_args[@]}
else
./.jenkins/build.sh ${cmake_args[@]}
fi
# Show sccache stats if it is running
if pgrep sccache > /dev/null; then
sccache --show-stats
fi
# Copy all necessary binaries to shared workspace
export CAFFE2_CI_ENV_DIR=/var/lib/jenkins/caffe2-ci-env
mkdir -p ${CAFFE2_CI_ENV_DIR}
cp -r /var/lib/jenkins/workspace ${CAFFE2_CI_ENV_DIR}/build_workspace # This copies all source files from build step to the next step
cp -r third_party/onnx ${CAFFE2_CI_ENV_DIR}/onnx
if [ -d "/usr/local/caffe2" ]; then
cp -r /usr/local/caffe2 ${CAFFE2_CI_ENV_DIR}/caffe2
fi
if [ -d "/opt/conda" ]; then
cp -r /opt/conda ${CAFFE2_CI_ENV_DIR}/conda_env
fi
- persist_to_workspace:
root: /var/lib/jenkins/caffe2-ci-env
paths:
- "*"
caffe2_linux_test_defaults: &caffe2_linux_test_defaults
machine:
image: default
steps:
- run:
name: Prepare workspace
command: |
sudo mkdir -p /opt/workspace
sudo chmod -R 777 /opt/workspace
- attach_workspace:
at: /opt/workspace
- run:
name: Build
no_output_timeout: "10h"
command: |
set -x
sudo pip install awscli
if [ -n "${CUDA_VERSION}" ]; then
curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
fi
sudo apt-get update
sudo apt-get remove linux-image-generic linux-headers-generic linux-generic
sudo apt-get install linux-headers-$(uname -r)
sudo apt-get install linux-image-generic
if [ -n "${CUDA_VERSION}" ]; then
wget 'https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-396.26.run'
sudo /bin/bash ./NVIDIA-Linux-x86_64-396.26.run -s --no-drm
sudo apt-get install -y nvidia-docker2
fi
sudo pkill -SIGHUP dockerd
if [ -n "${CUDA_VERSION}" ]; then
nvidia-smi
fi
# This IAM user only allows read-only access to ECR
export AWS_ACCESS_KEY_ID=AKIAJ2J6FIG5OSZTQ3IA
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}
eval $(aws ecr get-login --region us-east-1 --no-include-email)
docker pull ${DOCKER_IMAGE}
if [ -n "${CUDA_VERSION}" ]; then
id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
else
id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
fi
pwd
cp -r /opt/workspace/build_workspace/. /home/circleci/project # This copies all source files from build step to the current step
echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env
echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env
# This IAM user allows write access to S3 bucket for sccache
echo "declare -x AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}" >> /home/circleci/project/env
echo "declare -x BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" >> /home/circleci/project/env
# TODO: merge this into Caffe2 build.sh
cat >/home/circleci/project/ci_build_script.sh <<EOL
# =================== The following code will be executed inside Docker container ===================
set -ex
# libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...
sudo ln /dev/null /dev/raw1394
# Hotfix, use hypothesis 3.44.6 on Ubuntu 14.04
# See comments on https://github.com/HypothesisWorks/hypothesis-python/commit/eadd62e467d6cee6216e71b391951ec25b4f5830
if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then
sudo pip uninstall -y hypothesis
# "pip install hypothesis==3.44.6" from official server is unreliable on CircleCI, so we host a copy on S3 instead
sudo pip install attrs -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
sudo pip install coverage -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
sudo pip install hypothesis -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl
fi
# conda must be added to the path for Anaconda builds (this location must be
# the same as that in install_anaconda.sh used to build the docker image)
if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
export PATH=/opt/conda/bin:$PATH
fi
pip install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
pip install --user future
# Build
if test -x ".jenkins/caffe2/test.sh"; then
./.jenkins/caffe2/test.sh
else
./.jenkins/test.sh
fi
# Remove benign core dumps.
# These are tests for signal handling (including SIGABRT).
rm -f ./crash/core.fatal_signal_as.*
rm -f ./crash/core.logging_test.*
# =================== The above code will be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/ci_build_script.sh
docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"
if [ -d "/opt/workspace/caffe2" ]; then
echo "mkdir -p /usr/local/caffe2" | docker exec -u jenkins -i "$id" bash
docker cp /opt/workspace/caffe2/. "$id:/usr/local/caffe2"
fi
if [ -d "/opt/workspace/conda_env" ]; then
echo "sudo mkdir -p /opt/conda" | docker exec -u jenkins -i "$id" bash
docker cp /opt/workspace/conda_env/. "$id:/opt/conda"
fi
docker cp /opt/workspace/onnx/. "$id:/var/lib/jenkins/workspace/third_party/onnx"
(echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh') | docker exec -u jenkins -i "$id" bash
caffe2_macos_build_defaults: &caffe2_macos_build_defaults
macos:
xcode: "9.0"
steps:
- checkout
- run:
<<: *merge_pull_request_onto_master
- run:
name: Build
no_output_timeout: "10h"
command: |
set -ex
export IN_CIRCLECI=1
brew install cmake
# Reinitialize submodules
git submodule sync && git submodule update --init --recursive
# Reinitialize path (see man page for path_helper(8))
eval `/usr/libexec/path_helper -s`
# Use Homebrew Python if configured to do so
if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then
export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH
fi
pip install numpy
# Install Anaconda if we need to
if [ -n "${CAFFE2_USE_ANACONDA}" ]; then
rm -rf ${TMPDIR}/anaconda
curl -o ${TMPDIR}/anaconda.sh "https://repo.continuum.io/archive/Anaconda${ANACONDA_VERSION}-5.0.1-MacOSX-x86_64.sh"
/bin/bash ${TMPDIR}/anaconda.sh -b -p ${TMPDIR}/anaconda
rm -f ${TMPDIR}/anaconda.sh
export PATH="${TMPDIR}/anaconda/bin:${PATH}"
source ${TMPDIR}/anaconda/bin/activate
fi
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}
export SCCACHE_BIN=${PWD}/sccache_bin
mkdir -p ${SCCACHE_BIN}
if which sccache > /dev/null; then
printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"
chmod a+x "${SCCACHE_BIN}/clang++"
printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"
chmod a+x "${SCCACHE_BIN}/clang"
export PATH="${SCCACHE_BIN}:$PATH"
fi
# Build
if [ "${BUILD_IOS:-0}" -eq 1 ]; then
scripts/build_ios.sh
elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then
# All conda build logic should be in scripts/build_anaconda.sh
scripts/build_anaconda.sh
else
scripts/build_local.sh
fi
# Show sccache stats if it is running
if which sccache > /dev/null; then
sccache --show-stats
fi
version: 2
jobs:
pytorch_linux_trusty_py2_7_9_build_test:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py2.7.9:238
<<: *docker_config_defaults
<<: *pytorch_linux_cpu_build_test_defaults
pytorch_linux_trusty_py2_7_build_test:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py2.7:238
<<: *docker_config_defaults
<<: *pytorch_linux_cpu_build_test_defaults
pytorch_linux_trusty_py3_5_build_test:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.5:238
<<: *docker_config_defaults
<<: *pytorch_linux_cpu_build_test_defaults
pytorch_linux_trusty_py3_6_gcc4_8_build_test:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc4.8:238
<<: *docker_config_defaults
<<: *pytorch_linux_cpu_build_test_defaults
pytorch_linux_trusty_py3_6_gcc5_4_build_test:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc5.4:238
<<: *docker_config_defaults
<<: *pytorch_linux_cpu_build_test_defaults
pytorch_linux_trusty_py3_6_gcc7_build_test:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc7:238
<<: *docker_config_defaults
<<: *pytorch_linux_cpu_build_test_defaults
pytorch_linux_trusty_pynightly_build_test:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-pynightly:238
<<: *docker_config_defaults
<<: *pytorch_linux_cpu_build_test_defaults
pytorch_linux_xenial_py3_clang5_asan_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:238
<<: *docker_config_defaults
environment:
PYTHON_VERSION: "3.6"
<<: *pytorch_linux_build_defaults
pytorch_linux_xenial_py3_clang5_asan_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:238"
PYTHON_VERSION: "3.6"
resource_class: large
<<: *pytorch_linux_test_defaults
pytorch_linux_xenial_cuda8_cudnn6_py3_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238
<<: *docker_config_defaults
environment:
PYTHON_VERSION: "3.6"
CUDA_VERSION: "8"
<<: *pytorch_linux_build_defaults
pytorch_linux_xenial_cuda8_cudnn6_py3_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238"
PYTHON_VERSION: "3.6"
CUDA_VERSION: "8"
resource_class: gpu.medium
<<: *pytorch_linux_test_defaults
pytorch_linux_xenial_cuda8_cudnn6_py3_multigpu_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238"
PYTHON_VERSION: "3.6"
CUDA_VERSION: "8"
MULTI_GPU: "1"
resource_class: gpu.large
<<: *pytorch_linux_test_defaults
pytorch_linux_xenial_cuda9_cudnn7_py2_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py2:238
<<: *docker_config_defaults
environment:
PYTHON_VERSION: "2.7"
CUDA_VERSION: "9"
<<: *pytorch_linux_build_defaults
pytorch_linux_xenial_cuda9_cudnn7_py2_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py2:238"
PYTHON_VERSION: "2.7"
CUDA_VERSION: "9"
resource_class: gpu.medium
<<: *pytorch_linux_test_defaults
pytorch_linux_xenial_cuda9_cudnn7_py3_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:238
<<: *docker_config_defaults
environment:
PYTHON_VERSION: "3.6"
CUDA_VERSION: "9"
<<: *pytorch_linux_build_defaults
pytorch_linux_xenial_cuda9_cudnn7_py3_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:238"
PYTHON_VERSION: "3.6"
CUDA_VERSION: "9"
resource_class: gpu.medium
<<: *pytorch_linux_test_defaults
pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7:238
<<: *docker_config_defaults
environment:
PYTHON_VERSION: "3.6"
CUDA_VERSION: "9.2"
<<: *pytorch_linux_build_defaults
pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7:238"
PYTHON_VERSION: "3.6"
CUDA_VERSION: "9.2"
resource_class: gpu.medium
<<: *pytorch_linux_test_defaults
pytorch_macos_10_13_py3_build:
macos:
xcode: "9.0"
steps:
- checkout
- run:
<<: *merge_pull_request_onto_master
- run:
name: Build
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3
no_output_timeout: "10h"
command: |
set -ex
export IN_CIRCLECI=1
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}
git submodule sync && git submodule update --init
chmod a+x .jenkins/pytorch/macos-build.sh
.jenkins/pytorch/macos-build.sh
# TODO: need to share source files from build to test, when macOS builds are enabled
- persist_to_workspace:
root: /Users/distiller/pytorch-ci-env
paths:
- "*"
pytorch_macos_10_13_py3_test:
macos:
xcode: "9.0"
steps:
- run:
name: Prepare workspace
command: |
sudo mkdir -p /Users/distiller/pytorch-ci-env
sudo chmod -R 777 /Users/distiller/pytorch-ci-env
- attach_workspace:
at: /Users/distiller/pytorch-ci-env
- run:
name: Build
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3
no_output_timeout: "10h"
command: |
# TODO: need to share source files from build to test, when macOS builds are enabled
set -ex
export IN_CIRCLECI=1
chmod a+x .jenkins/pytorch/macos-test.sh
.jenkins/pytorch/macos-test.sh
pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
macos:
xcode: "9.0"
steps:
- checkout
- run:
<<: *merge_pull_request_onto_master
- run:
name: Build
environment:
JOB_BASE_NAME: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3
no_output_timeout: "10h"
command: |
set -ex
export IN_CIRCLECI=1
# Install CUDA 9.2
sudo rm -rf ~/cuda_9.2.64_mac_installer.app || true
curl https://s3.amazonaws.com/ossci-macos/cuda_9.2.64_mac_installer.zip -o ~/cuda_9.2.64_mac_installer.zip
unzip ~/cuda_9.2.64_mac_installer.zip -d ~/
sudo ~/cuda_9.2.64_mac_installer.app/Contents/MacOS/CUDAMacOSXInstaller --accept-eula --no-window
sudo cp /usr/local/cuda/lib/libcuda.dylib /Developer/NVIDIA/CUDA-9.2/lib/libcuda.dylib
sudo rm -rf /usr/local/cuda || true
# Install cuDNN 7.1 for CUDA 9.2
curl https://s3.amazonaws.com/ossci-macos/cudnn-9.2-osx-x64-v7.1.tgz -o ~/cudnn-9.2-osx-x64-v7.1.tgz
rm -rf ~/cudnn-9.2-osx-x64-v7.1 && mkdir ~/cudnn-9.2-osx-x64-v7.1
tar -xzvf ~/cudnn-9.2-osx-x64-v7.1.tgz -C ~/cudnn-9.2-osx-x64-v7.1
sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/include/
sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-9.2/lib/
sudo chmod a+r /Developer/NVIDIA/CUDA-9.2/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/lib/libcudnn*
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}
git submodule sync && git submodule update --init
chmod a+x .jenkins/pytorch/macos-build.sh
.jenkins/pytorch/macos-build.sh
caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:190
<<: *docker_config_defaults
environment:
CUDA_VERSION: "8"
BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn6-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:190"
CUDA_VERSION: "8"
BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn6-ubuntu16.04"
resource_class: gpu.medium
<<: *caffe2_linux_test_defaults
caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190
<<: *docker_config_defaults
environment:
CUDA_VERSION: "9"
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190"
CUDA_VERSION: "9"
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-ubuntu16.04"
resource_class: gpu.medium
<<: *caffe2_linux_test_defaults
caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190
<<: *docker_config_defaults
environment:
CUDA_VERSION: "9"
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-aten-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190"
CUDA_VERSION: "9"
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-aten-ubuntu16.04"
resource_class: gpu.medium
<<: *caffe2_linux_test_defaults
caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:190
<<: *docker_config_defaults
environment:
CUDA_VERSION: "9.1"
BUILD_ENVIRONMENT: "py2-cuda9.1-cudnn7-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:190"
CUDA_VERSION: "9.1"
BUILD_ENVIRONMENT: "py2-cuda9.1-cudnn7-ubuntu16.04"
resource_class: gpu.medium
<<: *caffe2_linux_test_defaults
caffe2_py2_mkl_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-mkl-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_mkl_ubuntu16_04_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:190"
BUILD_ENVIRONMENT: "py2-mkl-ubuntu16.04"
resource_class: large
<<: *caffe2_linux_test_defaults
caffe2_py2_gcc4_8_ubuntu14_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-gcc4.8-ubuntu14.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_gcc4_8_ubuntu14_04_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:190"
BUILD_ENVIRONMENT: "py2-gcc4.8-ubuntu14.04"
resource_class: large
<<: *caffe2_linux_test_defaults
caffe2_onnx_py2_gcc5_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "onnx-py2-gcc5-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_onnx_py2_gcc5_ubuntu16_04_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:190"
BUILD_ENVIRONMENT: "onnx-py2-gcc5-ubuntu16.04"
resource_class: large
<<: *caffe2_linux_test_defaults
caffe2_conda2_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda2-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "conda2-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_conda2_ubuntu16_04_test:
environment:
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda2-ubuntu16.04:190"
BUILD_ENVIRONMENT: "conda2-ubuntu16.04"
resource_class: large
<<: *caffe2_linux_test_defaults
caffe2_py2_cuda8_0_cudnn7_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn7-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn7-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_gcc4_9_ubuntu14_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.9-ubuntu14.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-gcc4.9-ubuntu14.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_clang3_8_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.8-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-clang3.8-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_clang3_9_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.9-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-clang3.9-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_gcc6_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc6-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-gcc6-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_gcc7_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc7-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-gcc7-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_cuda8_0_cudnn7_aten_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn7-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn7-aten-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_android_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-android-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-android-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_conda3_cuda9_0_cudnn7_ubuntu16_04_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda3-cuda9.0-cudnn7-ubuntu16.04:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "conda3-cuda9.0-cudnn7-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_cuda9_0_cudnn7_centos7_build:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-centos7:190
<<: *docker_config_defaults
environment:
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-centos7"
<<: *caffe2_linux_build_defaults
caffe2_py2_ios_macos10_13_build:
environment:
BUILD_IOS: "1"
PYTHON_INSTALLATION: "system"
PYTHON_VERSION: "2"
<<: *caffe2_macos_build_defaults
caffe2_py2_system_macos10_13_build:
environment:
PYTHON_INSTALLATION: "system"
PYTHON_VERSION: "2"
<<: *caffe2_macos_build_defaults
workflows:
version: 2
build:
jobs:
# - pytorch_linux_trusty_py2_7_9_build_test
# - pytorch_linux_trusty_py2_7_build_test
# - pytorch_linux_trusty_py3_5_build_test
# - pytorch_linux_trusty_py3_6_gcc4_8_build_test
# - pytorch_linux_trusty_py3_6_gcc5_4_build_test
# - pytorch_linux_trusty_py3_6_gcc7_build_test
# - pytorch_linux_trusty_pynightly_build_test
# - pytorch_linux_xenial_py3_clang5_asan_build
# - pytorch_linux_xenial_py3_clang5_asan_test:
# requires:
# - pytorch_linux_xenial_py3_clang5_asan_build
# - pytorch_linux_xenial_cuda8_cudnn6_py3_build
# - pytorch_linux_xenial_cuda8_cudnn6_py3_test:
# requires:
# - pytorch_linux_xenial_cuda8_cudnn6_py3_build
# - pytorch_linux_xenial_cuda8_cudnn6_py3_multigpu_test:
# requires:
# - pytorch_linux_xenial_cuda8_cudnn6_py3_build
# - pytorch_linux_xenial_cuda9_cudnn7_py2_build
# - pytorch_linux_xenial_cuda9_cudnn7_py2_test:
# requires:
# - pytorch_linux_xenial_cuda9_cudnn7_py2_build
# - pytorch_linux_xenial_cuda9_cudnn7_py3_build
# - pytorch_linux_xenial_cuda9_cudnn7_py3_test:
# requires:
# - pytorch_linux_xenial_cuda9_cudnn7_py3_build
# - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build
# - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_test:
# requires:
# - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build
# - pytorch_macos_10_13_py3_build
# - pytorch_macos_10_13_py3_test:
# requires:
# - pytorch_macos_10_13_py3_build
# - pytorch_macos_10_13_cuda9_2_cudnn7_py3_build
# - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build
# - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test:
# requires:
# - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build
# - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build
# - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_test:
# requires:
# - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build
# - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build
# - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_test:
# requires:
# - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build
# - caffe2_py2_mkl_ubuntu16_04_build
# - caffe2_py2_mkl_ubuntu16_04_test:
# requires:
# - caffe2_py2_mkl_ubuntu16_04_build
# - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build
# - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test:
# requires:
# - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build
# - caffe2_py2_gcc4_8_ubuntu14_04_build
# - caffe2_py2_gcc4_8_ubuntu14_04_test:
# requires:
# - caffe2_py2_gcc4_8_ubuntu14_04_build
# - caffe2_onnx_py2_gcc5_ubuntu16_04_build
# - caffe2_onnx_py2_gcc5_ubuntu16_04_test:
# requires:
# - caffe2_onnx_py2_gcc5_ubuntu16_04_build
# - caffe2_conda2_ubuntu16_04_build
# - caffe2_conda2_ubuntu16_04_test:
# requires:
# - caffe2_conda2_ubuntu16_04_build
# - caffe2_py2_cuda8_0_cudnn7_ubuntu16_04_build
# - caffe2_py2_gcc4_9_ubuntu14_04_build
# - caffe2_py2_clang3_8_ubuntu16_04_build
# - caffe2_py2_clang3_9_ubuntu16_04_build
# - caffe2_py2_gcc6_ubuntu16_04_build
# - caffe2_py2_gcc7_ubuntu16_04_build
# - caffe2_py2_cuda8_0_cudnn7_aten_ubuntu16_04_build
# - caffe2_py2_android_ubuntu16_04_build
# - caffe2_conda3_cuda9_0_cudnn7_ubuntu16_04_build
# - caffe2_py2_cuda9_0_cudnn7_centos7_build
# - caffe2_py2_ios_macos10_13_build
# - caffe2_py2_system_macos10_13_build

88
.clang-format Normal file
View File

@ -0,0 +1,88 @@
---
AccessModifierOffset: -1
AlignAfterOpenBracket: AlwaysBreak
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
AlignEscapedNewlinesLeft: true
AlignOperands: false
AlignTrailingComments: false
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: true
BinPackArguments: false
BinPackParameters: false
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
BeforeCatch: false
BeforeElse: false
IndentBraces: false
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Attach
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: false
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: false
ColumnLimit: 80
CommentPragmas: '^ IWYU pragma:'
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: true
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DerivePointerAlignment: false
DisableFormat: false
ForEachMacros: [ FOR_EACH_RANGE, FOR_EACH, ]
IncludeCategories:
- Regex: '^<.*\.h(pp)?>'
Priority: 1
- Regex: '^<.*'
Priority: 2
- Regex: '.*'
Priority: 3
IndentCaseLabels: true
IndentWidth: 2
IndentWrappedFunctionNames: false
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBlockIndentWidth: 2
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: false
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 2000000
PointerAlignment: Left
ReflowComments: true
SortIncludes: true
SpaceAfterCStyleCast: false
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Cpp11
TabWidth: 8
UseTab: Never
...

51
.clang-tidy Normal file
View File

@ -0,0 +1,51 @@
---
# NOTE: there must be no spaces before the '-', so put the comma first.
Checks: '
*
,clang-analyzer-*
,modernize-*
,-cert-dcl21-cpp
,-cert-err58-cpp
,-cert-err60-cpp
,-clang-diagnostic-*
,-cppcoreguidelines-owning-memory
,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
,-cppcoreguidelines-pro-bounds-constant-array-index
,-cppcoreguidelines-pro-type-member-init
,-cppcoreguidelines-pro-type-static-cast-downcast
,-cppcoreguidelines-pro-type-union-access
,-cppcoreguidelines-pro-type-vararg
,-cppcoreguidelines-special-member-functions
,-fuchsia-*
,-google-build-using-namespace
,-google-default-arguments
,-google-explicit-constructor
,-google-readability-braces-around-statements
,-google-readability-namespace-comments
,-google-readability-todo
,-google-runtime-references
,-google-runtime-references
,-hicpp-braces-around-statements
,-hicpp-explicit-conversions
,-hicpp-member-init
,-hicpp-no-array-decay
,-hicpp-signed-bitwise
,-hicpp-special-member-functions
,-hicpp-vararg
,-llvm-header-guard
,-llvm-include-order
,-llvm-namespace-comment
,-misc-unused-parameters
,-modernize-make-unique
,-modernize-use-default-member-init
,-performance-unnecessary-value-param
,-readability-braces-around-statements
,-readability-else-after-return
,-readability-implicit-bool-conversion
,-readability-named-parameter
'
WarningsAsErrors: ''
HeaderFilterRegex: 'torch/csrc/'
AnalyzeTemporaryDtors: false
CheckOptions:
...

1
.dockerignore Symbolic link
View File

@ -0,0 +1 @@
.gitignore

1
.gitattributes vendored Normal file
View File

@ -0,0 +1 @@
*.bat text eol=crlf

0
.github/CONTRIBUTING.md vendored Normal file
View File

38
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,38 @@
If you have a question or would like help and support, please ask at our
[forums](https://discuss.pytorch.org/).
If you are submitting a feature request, please preface the title with [feature request].
If you are submitting a bug report, please fill in the following details.
## Issue description
Provide a short description.
## Code example
Please try to provide a minimal example to repro the bug.
Error messages and stack traces are also helpful.
## System Info
Please copy and paste the output from our
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).
You can get the script and run it with:
```
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```
- PyTorch or Caffe2:
- How you installed PyTorch (conda, pip, source):
- Build command you used (if compiling from source):
- OS:
- PyTorch version:
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- GCC version (if compiling from source):
- CMake version:
- Versions of any other relevant libraries:

0
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

228
.gitignore vendored
View File

@ -1,29 +1,209 @@
build/
dist/
torch.egg-info/
*/**/__pycache__
torch/csrc/generic/TensorMethods.cpp
torch/lib/*.so*
torch/lib/*.dylib*
torch/lib/*.h
torch/lib/build
torch/lib/tmp_install
torch/lib/include
torch/lib/torch_shm_manager
torch/csrc/cudnn/cuDNN.cpp
torch/csrc/nn/THNN.cwrap
torch/csrc/nn/THNN.cpp
torch/csrc/nn/THCUNN.cwrap
torch/csrc/nn/THCUNN.cpp
docs/src/**/*
test/data/legacy_modules.t7
test/htmlcov
test/.coverage
# READ THIS BEFORE YOU REFACTOR ME
#
# setup.py uses the list of patterns in this file to decide
# what to delete, but it's not 100% sound. So, for example,
# if you delete aten/build/ because it's redundant with build/,
# aten/build/ will stop being cleaned. So be careful when
# refactoring this file!
## PyTorch
.mypy_cache
*/*.pyc
*/*.so*
*/**/__pycache__
*/**/*.dylib*
*/**/*.pyc
*/**/*.pyd
*/**/*.so*
*/**/**/*.pyc
*/**/**/**/*.pyc
*/**/**/**/**/*.pyc
*/*.so*
*/**/*.so*
*/**/*.dylib*
aten/build/
aten/src/ATen/Config.h
aten/src/ATen/cuda/CUDAConfig.h
build/
dist/
docs/src/**/*
docs/cpp/xml/
docs/cpp/html/
docs/cpp/api/
test/.coverage
test/cpp/api/mnist
test/custom_operator/model.pt
test/data/gpu_tensors.pt
test/data/legacy_modules.t7
test/data/legacy_serialized.pt
test/data/linear.pt
test/htmlcov
test/cpp_extensions/install/
third_party/build/
tools/shared/_utils_internal.py
torch.egg-info/
torch/csrc/autograd/generated/*
torch/csrc/cudnn/cuDNN.cpp
torch/csrc/generated
torch/csrc/generic/TensorMethods.cpp
torch/csrc/jit/generated/*
torch/csrc/jit/fusers/Config.h
torch/csrc/nn/THCUNN.cpp
torch/csrc/nn/THCUNN.cwrap
torch/csrc/nn/THNN_generic.cpp
torch/csrc/nn/THNN_generic.cwrap
torch/csrc/nn/THNN_generic.h
torch/csrc/nn/THNN.cpp
torch/csrc/nn/THNN.cwrap
torch/lib/*.a*
torch/lib/*.dll*
torch/lib/*.exe*
torch/lib/*.dylib*
torch/lib/*.h
torch/lib/*.lib
torch/lib/*.so*
torch/lib/build
torch/lib/cmake
torch/lib/include
torch/lib/pkgconfig
torch/lib/protoc
torch/lib/tmp_install
torch/lib/torch_shm_manager
torch/lib/python*
torch/share/
torch/version.py
# IPython notebook checkpoints
.ipynb_checkpoints
# Editor temporaries
*.swn
*.swo
*.swp
*.swm
*~
# macOS dir files
.DS_Store
# Symbolic files
tools/shared/cwrap_common.py
# Ninja files
.ninja_deps
.ninja_log
compile_commands.json
*.egg-info/
docs/source/scripts/activation_images/
## General
# Compiled Object files
*.slo
*.lo
*.o
*.cuo
*.obj
# Compiled Dynamic libraries
*.so
*.dylib
*.dll
# Compiled Static libraries
*.lai
*.la
*.a
*.lib
# Compiled protocol buffers
*.pb.h
*.pb.cc
*_pb2.py
# Compiled python
*.pyc
*.pyd
# Compiled MATLAB
*.mex*
# IPython notebook checkpoints
.ipynb_checkpoints
# Editor temporaries
*.swn
*.swo
*.swp
*~
# Sublime Text settings
*.sublime-workspace
*.sublime-project
# Eclipse Project settings
*.*project
.settings
# QtCreator files
*.user
# PyCharm files
.idea
# OSX dir files
.DS_Store
## Caffe2
# build, distribute, and bins (+ python proto bindings)
build
build_host_protoc
build_android
build_ios
/build_*
.build_debug/*
.build_release/*
distribute/*
*.testbin
*.bin
cmake_build
.cmake_build
gen
.setuptools-cmake-build
.pytest_cache
aten/build/*
# Bram
plsdontbreak
# Generated documentation
docs/_site
docs/gathered
_site
doxygen
docs/dev
# LevelDB files
*.sst
*.ldb
LOCK
LOG*
CURRENT
MANIFEST-*
# generated version file
caffe2/version.py
# setup.py intermediates
.eggs
caffe2.egg-info
# Atom/Watchman required file
.watchmanconfig
# BEGIN NOT-CLEAN-FILES (setup.py handles this marker. Do not change.)
#
# Below files are not deleted by "setup.py clean".
# Visual Studio Code files
.vscode
.vs

78
.gitmodules vendored Normal file
View File

@ -0,0 +1,78 @@
[submodule "third_party/catch"]
path = third_party/catch
url = https://github.com/catchorg/Catch2.git
[submodule "third_party/pybind11"]
path = third_party/pybind11
url = https://github.com/pybind/pybind11.git
[submodule "third_party/cub"]
path = third_party/cub
url = https://github.com/NVlabs/cub.git
[submodule "third_party/eigen"]
path = third_party/eigen
url = https://github.com/eigenteam/eigen-git-mirror.git
[submodule "third_party/googletest"]
path = third_party/googletest
url = https://github.com/google/googletest.git
[submodule "third_party/nervanagpu"]
path = third_party/nervanagpu
url = https://github.com/NervanaSystems/nervanagpu.git
[submodule "third_party/benchmark"]
path = third_party/benchmark
url = https://github.com/google/benchmark.git
[submodule "third_party/protobuf"]
path = third_party/protobuf
url = https://github.com/google/protobuf.git
[submodule "third_party/ios-cmake"]
path = third_party/ios-cmake
url = https://github.com/Yangqing/ios-cmake.git
[submodule "third_party/NNPACK"]
path = third_party/NNPACK
url = https://github.com/Maratyszcza/NNPACK.git
[submodule "third_party/gloo"]
path = third_party/gloo
url = https://github.com/facebookincubator/gloo
[submodule "third_party/NNPACK_deps/pthreadpool"]
path = third_party/pthreadpool
url = https://github.com/Maratyszcza/pthreadpool.git
[submodule "third_party/NNPACK_deps/FXdiv"]
path = third_party/FXdiv
url = https://github.com/Maratyszcza/FXdiv.git
[submodule "third_party/NNPACK_deps/FP16"]
path = third_party/FP16
url = https://github.com/Maratyszcza/FP16.git
[submodule "third_party/NNPACK_deps/psimd"]
path = third_party/psimd
url = https://github.com/Maratyszcza/psimd.git
[submodule "third_party/zstd"]
path = third_party/zstd
url = https://github.com/facebook/zstd.git
[submodule "third-party/cpuinfo"]
path = third_party/cpuinfo
url = https://github.com/Maratyszcza/cpuinfo.git
[submodule "third_party/python-enum"]
path = third_party/python-enum
url = https://github.com/PeachPy/enum34.git
[submodule "third_party/python-peachpy"]
path = third_party/python-peachpy
url = https://github.com/Maratyszcza/PeachPy.git
[submodule "third_party/python-six"]
path = third_party/python-six
url = https://github.com/benjaminp/six.git
[submodule "third_party/ComputeLibrary"]
path = third_party/ComputeLibrary
url = https://github.com/ARM-software/ComputeLibrary.git
[submodule "third_party/onnx"]
path = third_party/onnx
url = https://github.com/onnx/onnx.git
[submodule "third_party/cereal"]
path = third_party/cereal
url = https://github.com/USCiLab/cereal
[submodule "third_party/onnx-tensorrt"]
path = third_party/onnx-tensorrt
url = https://github.com/onnx/onnx-tensorrt
[submodule "third_party/sleef"]
path = third_party/sleef
url = https://github.com/shibatch/sleef
[submodule "third_party/ideep"]
path = third_party/ideep
url = https://github.com/intel/ideep

14
.jenkins/caffe2/README.md Normal file
View File

@ -0,0 +1,14 @@
# Jenkins
The scripts in this directory are the entrypoint for testing Caffe2.
The environment variable `BUILD_ENVIRONMENT` is expected to be set to
the build environment you intend to test. It is a hint for the build
and test scripts to configure Caffe2 a certain way and include/exclude
tests. Docker images, they equal the name of the image itself. For
example: `py2-cuda9.0-cudnn7-ubuntu16.04`. The Docker images that are
built on Jenkins and are used in triggered builds already have this
environment variable set in their manifest. Also see
`./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`.
Our Jenkins installation is located at https://ci.pytorch.org/jenkins/.

282
.jenkins/caffe2/build.sh Executable file
View File

@ -0,0 +1,282 @@
#!/bin/bash
set -ex
pip install --user --no-cache-dir hypothesis==3.59.0
# The INSTALL_PREFIX here must match up with test.sh
INSTALL_PREFIX="/usr/local/caffe2"
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
CMAKE_ARGS=()
SCCACHE="$(which sccache)"
if [ "$(which gcc)" != "/root/sccache/gcc" ]; then
# Setup SCCACHE
###############################################################################
# Setup sccache if SCCACHE_BUCKET is set
if [ -n "${SCCACHE_BUCKET}" ]; then
mkdir -p ./sccache
SCCACHE="$(which sccache)"
if [ -z "${SCCACHE}" ]; then
echo "Unable to find sccache..."
exit 1
fi
# Setup wrapper scripts
for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do
(
echo "#!/bin/sh"
echo "exec $SCCACHE $(which $compiler) \"\$@\""
) > "./sccache/$compiler"
chmod +x "./sccache/$compiler"
done
if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then
(
echo "#!/bin/sh"
echo "exec $SCCACHE $(which nvcc) \"\$@\""
) > "./sccache/nvcc"
chmod +x "./sccache/nvcc"
fi
export CACHE_WRAPPER_DIR="$PWD/sccache"
# CMake must find these wrapper scripts
export PATH="$CACHE_WRAPPER_DIR:$PATH"
fi
fi
# Setup ccache if configured to use it (and not sccache)
if [ -z "${SCCACHE}" ] && which ccache > /dev/null; then
mkdir -p ./ccache
ln -sf "$(which ccache)" ./ccache/cc
ln -sf "$(which ccache)" ./ccache/c++
ln -sf "$(which ccache)" ./ccache/gcc
ln -sf "$(which ccache)" ./ccache/g++
ln -sf "$(which ccache)" ./ccache/x86_64-linux-gnu-gcc
if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then
ln -sf "$(which ccache)" ./ccache/nvcc
fi
export CACHE_WRAPPER_DIR="$PWD/ccache"
export PATH="$CACHE_WRAPPER_DIR:$PATH"
fi
# sccache will fail for CUDA builds if all cores are used for compiling
if [ -z "$MAX_JOBS" ]; then
if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]] && [ -n "${SCCACHE}" ]; then
MAX_JOBS=`expr $(nproc) - 1`
else
MAX_JOBS=$(nproc)
fi
fi
report_compile_cache_stats() {
if [[ -n "${SCCACHE}" ]]; then
"$SCCACHE" --show-stats
elif which ccache > /dev/null; then
ccache -s
fi
}
###############################################################################
# Explicitly set Python executable.
###############################################################################
# On Ubuntu 16.04 the default Python is still 2.7.
PYTHON="$(which python)"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
CMAKE_ARGS+=("-DPYTHON_EXECUTABLE=${PYTHON}")
fi
###############################################################################
# Use special scripts for Android, conda, and setup builds
###############################################################################
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
CMAKE_ARGS+=("-DBUILD_BINARY=ON")
CMAKE_ARGS+=("-DBUILD_TEST=ON")
CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")
CMAKE_ARGS+=("-DUSE_ZSTD=ON")
"${ROOT_DIR}/scripts/build_android.sh" ${CMAKE_ARGS[*]} "$@"
exit 0
elif [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
"${ROOT_DIR}/scripts/build_anaconda.sh" --skip-tests --install-locally "$@"
report_compile_cache_stats
# This build will be tested against onnx tests, which needs onnx installed.
# At this point the visible protbuf installation will be in conda, since one
# of Caffe2's dependencies uses conda, so the correct protobuf include
# headers are those in conda as well
# This path comes from install_anaconda.sh which installs Anaconda into the
# docker image
PROTOBUF_INCDIR=/opt/conda/include pip install -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"
report_compile_cache_stats
exit 0
fi
###############################################################################
# Set cmake args
###############################################################################
CMAKE_ARGS+=("-DBUILD_BINARY=ON")
CMAKE_ARGS+=("-DBUILD_TEST=ON")
CMAKE_ARGS+=("-DINSTALL_TEST=ON")
CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")
CMAKE_ARGS+=("-DUSE_ZSTD=ON")
CMAKE_ARGS+=("-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")
if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then
CMAKE_ARGS+=("-DBLAS=MKL")
fi
if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then
CMAKE_ARGS+=("-DUSE_CUDA=ON")
CMAKE_ARGS+=("-DCUDA_ARCH_NAME=Maxwell")
CMAKE_ARGS+=("-DUSE_NNPACK=OFF")
# Explicitly set path to NVCC such that the symlink to ccache or sccache is used
CMAKE_ARGS+=("-DCUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")
# Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit.
# Setting PATH to resolve to the right nvcc alone isn't enough.
# See /usr/share/cmake-3.5/Modules/FindCUDA.cmake, block at line 589.
export CUDA_PATH="/usr/local/cuda"
# Ensure the ccache symlink can still find the real nvcc binary.
export PATH="/usr/local/cuda/bin:$PATH"
fi
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# TODO: This is patching the official FindHip to properly handly
# cmake generator expression. A PR is opened in the upstream repo here:
# https://github.com/ROCm-Developer-Tools/HIP/pull/516
# remove this hack once it's merged.
if [[ -f /opt/rocm/hip/cmake/FindHIP.cmake ]]; then
sudo sed -i 's/\ -I${dir}/\ $<$<BOOL:${dir}>:-I${dir}>/' /opt/rocm/hip/cmake/FindHIP.cmake
fi
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
export HCC_AMDGPU_TARGET=gfx900
# The link time of libcaffe2_hip.so takes 40 minutes, according to
# https://github.com/RadeonOpenCompute/hcc#thinlto-phase-1---implemented
# using using ThinLTO could significantly improve link-time performance.
export KMTHINLTO=1
########## HIPIFY Caffe2 operators
${PYTHON} "${ROOT_DIR}/tools/amd_build/build_pytorch_amd.py"
${PYTHON} "${ROOT_DIR}/tools/amd_build/build_caffe2_amd.py"
fi
# Try to include Redis support for Linux builds
if [ "$(uname)" == "Linux" ]; then
CMAKE_ARGS+=("-DUSE_REDIS=ON")
fi
# Currently, on Jenkins mac os, we will use custom protobuf. Mac OS
# contbuild at the moment is minimal dependency - it doesn't use glog
# or gflags either.
if [ "$(uname)" == "Darwin" ]; then
CMAKE_ARGS+=("-DBUILD_CUSTOM_PROTOBUF=ON")
fi
# Use a speciallized onnx namespace in CI to catch hardcoded onnx namespace
CMAKE_ARGS+=("-DONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")
# We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04)
# and use that if so.
if [[ -x "$(command -v cmake3)" ]]; then
CMAKE_BINARY=cmake3
else
CMAKE_BINARY=cmake
fi
###############################################################################
# Configure and make
###############################################################################
if [[ -z "$INTEGRATED" ]]; then
# Run cmake from ./build_caffe2 directory so it doesn't conflict with
# standard PyTorch build directory. Eventually these won't need to
# be separate.
rm -rf build_caffe2
mkdir build_caffe2
cd ./build_caffe2
# Configure
${CMAKE_BINARY} "${ROOT_DIR}" ${CMAKE_ARGS[*]} "$@"
# Build
if [ "$(uname)" == "Linux" ]; then
make "-j${MAX_JOBS}" install
else
echo "Don't know how to build on $(uname)"
exit 1
fi
else
# sccache will be stuck if all cores are used for compiling
# see https://github.com/pytorch/pytorch/pull/7361
if [[ -n "${SCCACHE}" ]]; then
export MAX_JOBS=`expr $(nproc) - 1`
fi
USE_LEVELDB=1 USE_LMDB=1 USE_OPENCV=1 BUILD_BINARY=1 python setup.py install --user
# This is to save test binaries for testing
cp -r torch/lib/tmp_install $INSTALL_PREFIX
ls $INSTALL_PREFIX
report_compile_cache_stats
fi
###############################################################################
# Install ONNX
###############################################################################
# Install ONNX into a local directory
pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"
report_compile_cache_stats
# Symlink the caffe2 base python path into the system python path,
# so that we can import caffe2 without having to change $PYTHONPATH.
# Run in a subshell to contain environment set by /etc/os-release.
#
# This is only done when running on Jenkins! We don't want to pollute
# the user environment with Python symlinks and ld.so.conf.d hacks.
#
if [[ -z "$INTEGRATED" ]]; then
if [ -n "${JENKINS_URL}" ]; then
(
source /etc/os-release
function python_version() {
"$PYTHON" -c 'import sys; print("python%d.%d" % sys.version_info[0:2])'
}
# Debian/Ubuntu
if [[ "$ID_LIKE" == *debian* ]]; then
python_path="/usr/local/lib/$(python_version)/dist-packages"
sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"
fi
# RHEL/CentOS
if [[ "$ID_LIKE" == *rhel* ]]; then
python_path="/usr/lib64/$(python_version)/site-packages/"
sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"
fi
# /etc/ld.so.conf.d is used on both Debian and RHEL
echo "${INSTALL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/caffe2.conf
sudo ldconfig
)
fi
fi

7
.jenkins/caffe2/dirty.sh Executable file
View File

@ -0,0 +1,7 @@
#!/bin/bash
set -ex
upstream="$1"
pr="$2"
git diff --name-only "$upstream" "$pr"
# For safety, unconditionally trigger for any changes.
#git diff --name-only "$upstream" "$pr" | grep -Eq '^(CMakeLists.txt|Makefile|.gitmodules|.jenkins/caffe2|binaries|caffe|caffe2|cmake|conda|docker|docs/caffe2|modules|scripts|third_party)'

153
.jenkins/caffe2/test.sh Executable file
View File

@ -0,0 +1,153 @@
#!/bin/bash
set -ex
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
TEST_DIR=$ROOT_DIR/caffe2_tests
# Figure out which Python to use
PYTHON="python"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON="python${BASH_REMATCH[1]}"
fi
# The prefix must mirror the setting from build.sh
INSTALL_PREFIX="/usr/local/caffe2"
# Anaconda builds have a special install prefix and python
if [[ "$BUILD_ENVIRONMENT" == conda* ]]; then
# This path comes from install_anaconda.sh which installs Anaconda into the
# docker image
PYTHON="/opt/conda/bin/python"
INSTALL_PREFIX="/opt/conda/"
fi
# Add the site-packages in the caffe2 install prefix to the PYTHONPATH
SITE_DIR=$($PYTHON -c "from distutils import sysconfig; print(sysconfig.get_python_lib(prefix=''))")
INSTALL_SITE_DIR="${INSTALL_PREFIX}/${SITE_DIR}"
# Skip tests in environments where they are not built/applicable
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
echo 'Skipping tests'
exit 0
fi
# Set PYTHONPATH and LD_LIBRARY_PATH so that python can find the installed
# Caffe2. This shouldn't be done on Anaconda, as Anaconda should handle this.
if [[ "$BUILD_ENVIRONMENT" != conda* ]]; then
export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"
fi
cd "$ROOT_DIR"
if [ -d $TEST_DIR ]; then
echo "Directory $TEST_DIR already exists; please remove it..."
exit 1
fi
mkdir -p $TEST_DIR/{cpp,python}
cd "${WORKSPACE}"
# C++ tests
echo "Running C++ tests.."
gtest_reports_dir="${TEST_DIR}/cpp"
junit_reports_dir="${TEST_DIR}/junit_reports"
mkdir -p "$gtest_reports_dir" "$junit_reports_dir"
for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do
case "$test" in
# skip tests we know are hanging or bad
*/mkl_utils_test|*/aten/integer_divider_test)
continue
;;
*/scalar_tensor_test|*/basic|*/native_test)
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
continue
else
"$test"
fi
;;
*)
# Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While
# planning to migrate to gtest as the common PyTorch c++ test suite, we
# currently do NOT use the xml test reporter, because Catch doesn't
# support multiple reporters
# c.f. https://github.com/catchorg/Catch2/blob/master/docs/release-notes.md#223
# which means that enabling XML output means you lose useful stdout
# output for Jenkins. It's more important to have useful console
# output than it is to have XML output for Jenkins.
# Note: in the future, if we want to use xml test reporter once we switch
# to all gtest, one can simply do:
# "$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"
"$test"
;;
esac
done
# Get the relative path to where the caffe2 python module was installed
CAFFE2_PYPATH="$INSTALL_SITE_DIR/caffe2"
# Collect additional tests to run (outside caffe2/python)
EXTRA_TESTS=()
# CUDA builds always include NCCL support
if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then
EXTRA_TESTS+=("$CAFFE2_PYPATH/contrib/nccl")
fi
conda_ignore_test=()
if [[ $BUILD_ENVIRONMENT == conda* ]]; then
# These tests both assume Caffe2 was built with leveldb, which is not the case
conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/dataio_test.py")
conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/checkpoint_test.py")
fi
rocm_ignore_test=()
if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
# Currently these tests are failing on ROCM platform:
# Unknown reasons, need to debug
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/arg_ops_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/piecewise_linear_transform_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/softmax_ops_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/unique_ops_test.py")
# Need to go through roi ops to replace max(...) with fmaxf(...)
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/roi_align_rotated_op_test.py")
# Our cuda top_k op has some asm code, the hipified version doesn't
# compile yet, so we don't have top_k operator for now
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/top_k_test.py")
# Our AMD CI boxes have 4 gpus on each
# Remove this once we have added multi-gpu support
export HIP_VISIBLE_DEVICES=$(($BUILD_NUMBER % 4))
fi
# Python tests
echo "Running Python tests.."
"$PYTHON" \
-m pytest \
-x \
-v \
--junit-xml="$TEST_DIR/python/result.xml" \
--ignore "$CAFFE2_PYPATH/python/test/executor_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \
--ignore "$CAFFE2_PYPATH/python/mkl/mkl_sbn_speed_test.py" \
${conda_ignore_test[@]} \
${rocm_ignore_test[@]} \
"$CAFFE2_PYPATH/python" \
"${EXTRA_TESTS[@]}"
cd ${INSTALL_PREFIX}
if [[ -n "$INTEGRATED" ]]; then
pip install --user torchvision
"$ROOT_DIR/scripts/onnx/test.sh"
fi

View File

@ -0,0 +1,42 @@
This directory contains scripts for our continuous integration.
One important thing to keep in mind when reading the scripts here is
that they are all based off of Docker images, which we build for each of
the various system configurations we want to run on Jenkins. This means
it is very easy to run these tests yourself:
1. Figure out what Docker image you want. The general template for our
images look like:
``registry.pytorch.org/pytorch/pytorch-$BUILD_ENVIRONMENT:$DOCKER_VERSION``,
where ``$BUILD_ENVIRONMENT`` is one of the build environments
enumerated in
[pytorch-dockerfiles](https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh)
2. Run ``docker -it -u jenkins $DOCKER_IMAGE``, clone PyTorch and
run one of the scripts in this directory.
The Docker images are designed so that any "reasonable" build commands
will work; if you look in [build.sh](build.sh) you will see that it is a
very simple script. This is intentional. Idiomatic build instructions
should work inside all of our Docker images. You can tweak the commands
however you need (e.g., in case you want to rebuild with DEBUG, or rerun
the build with higher verbosity, etc.).
We have to do some work to make this so. Here is a summary of the
mechanisms we use:
- We install binaries to directories like `/usr/local/bin` which
are automatically part of your PATH.
- We add entries to the PATH using Docker ENV variables (so
they apply when you enter Docker) and `/etc/environment` (so they
continue to apply even if you sudo), instead of modifying
`PATH` in our build scripts.
- We use `/etc/ld.so.conf.d` to register directories containing
shared libraries, instead of modifying `LD_LIBRARY_PATH` in our
build scripts.
- We reroute well known paths like `/usr/bin/gcc` to alternate
implementations with `update-alternatives, instead of setting
`CC` and `CXX` in our implementations.

21
.jenkins/pytorch/build-asan.sh Executable file
View File

@ -0,0 +1,21 @@
#!/bin/bash
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Clang version:"
clang --version
# detect_leaks=0: Python is very leaky, so we need suppress it
# symbolize=1: Gives us much better errors when things go wrong
export ASAN_OPTIONS=detect_leaks=0:symbolize=1
# TODO: Make the ASAN flags a more unified env var
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan" \
NO_CUDA=1 \
python setup.py install

145
.jenkins/pytorch/build.sh Executable file
View File

@ -0,0 +1,145 @@
#!/bin/bash
# For distributed, four environmental configs:
# (1) build with only NCCL
# (2) build with NCCL and MPI
# (3) build with only MPI
# (4) build with neither
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
sudo apt-get install -y --no-install-recommends openssh-client openssh-server
sudo mkdir -p /var/run/sshd
fi
if [[ "$BUILD_ENVIRONMENT" == "pytorch-linux-xenial-py3-clang5-asan" ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" $*
fi
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Python version:"
python --version
echo "GCC version:"
gcc --version
echo "CMake version:"
cmake --version
# TODO: Don't run this...
pip install -r requirements.txt || true
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# This is necessary in order to cross compile (or else we'll have missing GPU device).
export HCC_AMDGPU_TARGET=gfx900
# These environment variables are not set on CI when we were running as the Jenkins user.
# The HIP Utility scripts require these environment variables to be set in order to run without error.
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
# This environment variable enabled HCC Optimizations that speed up the linking stage.
# https://github.com/RadeonOpenCompute/hcc#hcc-with-thinlto-linking
export KMTHINLTO=1
# Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime
sudo apt-get install libc++1
sudo apt-get install libc++abi1
python tools/amd_build/build_pytorch_amd.py
python tools/amd_build/build_caffe2_amd.py
USE_ROCM=1 python setup.py install --user
exit 0
fi
# TODO: Don't install this here
if ! which conda; then
pip install mkl mkl-devel
fi
# sccache will fail for CUDA builds if all cores are used for compiling
# gcc 7 with sccache seems to have intermittent OOM issue if all cores are used
if [ -z "$MAX_JOBS" ]; then
if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then
export MAX_JOBS=`expr $(nproc) - 1`
fi
fi
# Target only our CI GPU machine's CUDA arch to speed up the build
export TORCH_CUDA_ARCH_LIST="5.2"
if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
export TORCH_CUDA_ARCH_LIST="6.0"
fi
if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc5.4* ]]; then
export DEBUG=1
fi
# ppc64le build fails when WERROR=1
# set only when building other architectures
# only use for "python setup.py install" line
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
WERROR=1 python setup.py install
elif [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
python setup.py install
fi
# Add the test binaries so that they won't be git clean'ed away
git add -f build/bin
# Test C FFI plugins
# cffi install doesn't work for Python 3.7
if [[ "$BUILD_ENVIRONMENT" != *pynightly* ]]; then
# TODO: Don't run this here
pip install cffi
git clone https://github.com/pytorch/extension-ffi.git
pushd extension-ffi/script
python build.py
popd
fi
# Test documentation build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn6-py3* ]]; then
pushd docs
# TODO: Don't run this here
pip install -r requirements.txt || true
LC_ALL=C make html
popd
fi
# Test no-Python build
if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
echo "Building libtorch"
# NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push.
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
mkdir -p ../cpp-build/caffe2
pushd ../cpp-build/caffe2
WERROR=1 VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY
popd
# Build custom operator tests.
CUSTOM_OP_BUILD="$PWD/../custom-op-build"
CUSTOM_OP_TEST="$PWD/test/custom_operator"
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
mkdir "$CUSTOM_OP_BUILD"
pushd "$CUSTOM_OP_BUILD"
CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake "$CUSTOM_OP_TEST"
make VERBOSE=1
popd
fi

140
.jenkins/pytorch/common.sh Normal file
View File

@ -0,0 +1,140 @@
#!/bin/bash
# Common setup for all Jenkins scripts
# NB: define this function before set -x, so that we don't
# pollute the log with a premature EXITED_USER_LAND ;)
function cleanup {
# Note that if you've exited user land, then CI will conclude that
# any failure is the CI's fault. So we MUST only output this
# string
retcode=$?
set +x
if [ $retcode -eq 0 ]; then
echo "EXITED_USER_LAND"
fi
}
set -ex
# Required environment variables:
# $BUILD_ENVIRONMENT (should be set by your Docker image)
# This token is used by a parser on Jenkins logs for determining
# if a failure is a legitimate problem, or a problem with the build
# system; to find out more, grep for this string in ossci-job-dsl.
echo "ENTERED_USER_LAND"
# compositional trap taken from https://stackoverflow.com/a/7287873/23845
# note: printf is used instead of echo to avoid backslash
# processing and to properly handle values that begin with a '-'.
log() { printf '%s\n' "$*"; }
error() { log "ERROR: $*" >&2; }
fatal() { error "$@"; exit 1; }
# appends a command to a trap
#
# - 1st arg: code to add
# - remaining args: names of traps to modify
#
trap_add() {
trap_add_cmd=$1; shift || fatal "${FUNCNAME} usage error"
for trap_add_name in "$@"; do
trap -- "$(
# helper fn to get existing trap command from output
# of trap -p
extract_trap_cmd() { printf '%s\n' "$3"; }
# print existing trap command with newline
eval "extract_trap_cmd $(trap -p "${trap_add_name}")"
# print the new trap command
printf '%s\n' "${trap_add_cmd}"
)" "${trap_add_name}" \
|| fatal "unable to add to trap ${trap_add_name}"
done
}
# set the trace attribute for the above function. this is
# required to modify DEBUG or RETURN traps because functions don't
# inherit them unless the trace attribute is set
declare -f -t trap_add
trap_add cleanup EXIT
if which sccache > /dev/null; then
# Save sccache logs to file
sccache --stop-server || true
rm ~/sccache_error.log || true
SCCACHE_ERROR_LOG=~/sccache_error.log RUST_LOG=sccache::server=error sccache --start-server
# Report sccache stats for easier debugging
sccache --zero-stats
function sccache_epilogue() {
echo '=================== sccache compilation log ==================='
python $(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py ~/sccache_error.log
echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
sccache --show-stats
sccache --stop-server || true
}
trap_add sccache_epilogue EXIT
fi
if which ccache > /dev/null; then
# Report ccache stats for easier debugging
ccache --zero-stats
ccache --show-stats
function ccache_epilogue() {
ccache --show-stats
}
trap_add ccache_epilogue EXIT
fi
# It's called a COMPACT_JOB_NAME because it's distinct from the
# Jenkin's provided JOB_NAME, which also includes a prefix folder
# e.g. pytorch-builds/
if [ -z "$COMPACT_JOB_NAME" ]; then
echo "Jenkins build scripts must set COMPACT_JOB_NAME"
exit 1
fi
if grep --line-regexp -q "$COMPACT_JOB_NAME" "$(dirname "${BASH_SOURCE[0]}")/disabled-configs.txt"; then
echo "Job is explicitly disabled, SKIPPING"
exit 0
else
echo "Job is not disabled, proceeding"
fi
if grep --line-regexp -q "$COMPACT_JOB_NAME" "$(dirname "${BASH_SOURCE[0]}")/enabled-configs.txt"; then
echo "Job is enabled, proceeding"
else
echo "Job is not enabled, FAILING now (revert changes to enabled-configs.txt to fix this)"
exit 1
fi
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3 ]] || \
[[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]]; then
BUILD_TEST_LIBTORCH=1
else
BUILD_TEST_LIBTORCH=0
fi
# Use conda cmake in some CI build. Conda cmake will be newer than our supported
# min version 3.5, so we only do it in two builds that we know should use conda.
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *cuda8-cudnn6-py2* ]] || \
[[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py3* ]]; then
if ! which conda; then
echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"
exit 1
else
conda install -q -y cmake
fi
else
if ! cmake --version | grep 'cmake version 3\.5'; then
echo "Expected ${BUILD_ENVIRONMENT} to have cmake version 3.5.* (min support version), but 'cmake --version' returns:"
cmake --version
exit 1
fi
fi
fi

10
.jenkins/pytorch/dirty.sh Executable file
View File

@ -0,0 +1,10 @@
#!/bin/bash
set -ex
upstream="$1"
pr="$2"
git diff --name-only "$upstream" "$pr"
# Now that PyTorch build depends on Caffe2, unconditionally trigger
# for any changes.
# TODO: Replace this with a NEGATIVE regex that allows us to blacklist
# files (letting us skip builds when they are unnecessary)
#git diff --name-only "$upstream" "$pr" | grep -Eq '^(aten/|caffe2/|.jenkins/pytorch|docs/(make.bat|Makefile|requirements.txt|source)|mypy|requirements.txt|setup.py|test/|third_party/|tools/|\.gitmodules|torch/)'

View File

@ -0,0 +1,5 @@
# This file contains a list of disabled configurations. Disabled
# configurations are skipped and not considered a failure if they
# fail. You can use this to temporarily reserve a test name to
# turn on CI side before PyTorch repository supports it. This
# file has the same format as .jenkins/enabled-configs.txt

View File

@ -0,0 +1,6 @@
#!/bin/bash
COMPACT_JOB_NAME="docker-build-test"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
docker build -t pytorch .

View File

@ -0,0 +1,48 @@
# This file contains a list of enabled configurations
# to perform tests on. If you want to run tests on CI on
# a limited set of tests before enabling the full test suite,
# you can delete lines from this file. Any test that is not
# in this file will report a failure (so you don't forget to
# reenable the tests on merge ;)
pytorch-linux-xenial-cuda8-cudnn6-py3-build
pytorch-linux-xenial-cuda8-cudnn6-py3-test
pytorch-linux-xenial-cuda8-cudnn6-py3-multigpu-test
pytorch-linux-xenial-cuda9-cudnn7-py2-build
pytorch-linux-xenial-cuda9-cudnn7-py2-test
pytorch-linux-xenial-cuda9-cudnn7-py3-build
pytorch-linux-xenial-cuda9-cudnn7-py3-test
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-build
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-test
pytorch-linux-xenial-py3-clang5-asan-build
pytorch-linux-xenial-py3-clang5-asan-test
pytorch-linux-trusty-py2.7.9-build
pytorch-linux-trusty-py2.7.9-test
pytorch-linux-trusty-py2.7-build
pytorch-linux-trusty-py2.7-test
pytorch-linux-trusty-py3.5-build
pytorch-linux-trusty-py3.5-test
pytorch-linux-trusty-py3.6-gcc4.8-build
pytorch-linux-trusty-py3.6-gcc4.8-test
pytorch-linux-trusty-py3.6-gcc5.4-build
pytorch-linux-trusty-py3.6-gcc5.4-test
pytorch-linux-trusty-py3.6-gcc7.2-build
pytorch-linux-trusty-py3.6-gcc7.2-test
pytorch-linux-trusty-py3.6-gcc7-build
pytorch-linux-trusty-py3.6-gcc7-test
pytorch-linux-trusty-pynightly-build
pytorch-linux-trusty-pynightly-test
pytorch-win-ws2016-cuda9-cudnn7-py3-build
pytorch-win-ws2016-cuda9-cudnn7-py3-test
pytorch-macos-10.13-py3-build
pytorch-macos-10.13-py3-test
pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
pytorch-docker-build-test
short-perf-test-cpu
short-perf-test-gpu
py2-clang3.8-rocm1.7.1-ubuntu16.04-build
py2-clang3.8-rocm1.7.1-ubuntu16.04-test
pytorch-ppc64le-cuda9.2-cudnn7-py3-build
pytorch-ppc64le-cuda9.2-cudnn7-py3-test
pytorch-ppc64le-cuda9.1-cudnn7-py3-build
pytorch-ppc64le-cuda9.1-cudnn7-py3-test

View File

@ -0,0 +1,9 @@
#!/bin/bash
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-build* ]]; then
source "$(dirname "${BASH_SOURCE[0]}")/macos-build.sh"
fi
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test* ]]; then
source "$(dirname "${BASH_SOURCE[0]}")/macos-test.sh"
fi

72
.jenkins/pytorch/macos-build.sh Executable file
View File

@ -0,0 +1,72 @@
#!/bin/bash
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"
export PATH="/usr/local/bin:$PATH"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Set up conda environment
export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then
mkdir -p ${PYTORCH_ENV_DIR}
curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh
bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3
fi
export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"
source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/
# Build PyTorch
if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then
export CUDA_VERSION=9.2
export TORCH_CUDA_ARCH_LIST=5.2
export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}
export NO_CUDA=0
if [ -z "${IN_CIRCLECI}" ]; then
# Eigen gives "explicit specialization of class must precede its first use" error
# when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
export DEVELOPER_DIR=/Library/Developer/CommandLineTools
fi
else
if [ -z "${IN_CIRCLECI}" ]; then
export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
fi
fi
export MACOSX_DEPLOYMENT_TARGET=10.9
export CXX=clang++
export CC=clang
if which sccache > /dev/null; then
printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${PYTORCH_ENV_DIR}/clang++"
chmod a+x "${PYTORCH_ENV_DIR}/clang++"
printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${PYTORCH_ENV_DIR}/clang"
chmod a+x "${PYTORCH_ENV_DIR}/clang"
if [[ "${JOB_BASE_NAME}" == *cuda* ]]; then
printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${PYTORCH_ENV_DIR}/nvcc"
chmod a+x "${PYTORCH_ENV_DIR}/nvcc"
export CUDA_NVCC_EXECUTABLE="${PYTORCH_ENV_DIR}/nvcc"
fi
export PATH="${PYTORCH_ENV_DIR}:$PATH"
fi
# If we run too many parallel jobs, we will OOM
export MAX_JOBS=2
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
python setup.py install
# Upload torch binaries when the build job is finished
if [ -z "${IN_CIRCLECI}" ]; then
7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read
fi

112
.jenkins/pytorch/macos-test.sh Executable file
View File

@ -0,0 +1,112 @@
#!/bin/bash
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
export PATH="/usr/local/bin:$PATH"
# Set up conda environment
export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then
mkdir -p ${PYTORCH_ENV_DIR}
curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh
bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3
fi
export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"
source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
if [ -z "${IN_CIRCLECI}" ]; then
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
fi
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/
# Test PyTorch
if [ -z "${IN_CIRCLECI}" ]; then
if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then
# Eigen gives "explicit specialization of class must precede its first use" error
# when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
export DEVELOPER_DIR=/Library/Developer/CommandLineTools
else
export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
fi
fi
export MACOSX_DEPLOYMENT_TARGET=10.9
export CXX=clang++
export CC=clang
# If we run too many parallel jobs, we will OOM
export MAX_JOBS=2
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
# Download torch binaries in the test jobs
if [ -z "${IN_CIRCLECI}" ]; then
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z
7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"
fi
test_python_all() {
echo "Ninja version: $(ninja --version)"
python test/run_test.py --verbose
}
test_cpp_api() {
# C++ API
# NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push.
# But still clean it before we perform our own build.
#
CPP_BUILD="$PWD/../cpp-build"
rm -rf $CPP_BUILD
mkdir -p $CPP_BUILD/caffe2
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd $CPP_BUILD/caffe2
VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY
popd
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
# Unfortunately it seems like the test can't load from miniconda3
# without these paths being set
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"
"$CPP_BUILD"/caffe2/bin/test_api
}
test_custom_script_ops() {
echo "Testing custom script operators"
pushd test/custom_operator
# Build the custom operator library.
rm -rf build && mkdir build
pushd build
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake ..
make VERBOSE=1
popd
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
popd
}
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
test_python_all
test_cpp_api
test_custom_script_ops
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
test_python_all
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
test_cpp_api
test_custom_script_ops
fi
fi

View File

@ -0,0 +1,28 @@
#!/bin/bash
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-multigpu-test"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch (distributed only)"
if [ -n "${IN_CIRCLECI}" ]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
sudo apt-get install -y --no-install-recommends openssh-client openssh-server
sudo mkdir -p /var/run/sshd
fi
fi
time python test/run_test.py --verbose -i distributed

View File

@ -0,0 +1,21 @@
#!/bin/bash
run_test () {
rm -rf test_tmp/ && mkdir test_tmp/ && cd test_tmp/
"$@"
cd .. && rm -rf test_tmp/
}
get_runtime_of_command () {
TIMEFORMAT=%R
# runtime=$( { time ($@ &> /dev/null); } 2>&1 1>/dev/null)
runtime=$( { time $@; } 2>&1 1>/dev/null)
if [[ $runtime == *"Error"* ]]; then
exit 1
fi
runtime=${runtime#+++ $@}
runtime=$(python -c "print($runtime)")
echo $runtime
}

View File

@ -0,0 +1,66 @@
import sys
import json
import numpy
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--test-name', dest='test_name', action='store',
required=True, help='test name')
parser.add_argument('--sample-stats', dest='sample_stats', action='store',
required=True, help='stats from sample')
parser.add_argument('--update', action='store_true',
help='whether to update baseline using stats from sample')
args = parser.parse_args()
test_name = args.test_name
if 'cpu' in test_name:
backend = 'cpu'
elif 'gpu' in test_name:
backend = 'gpu'
data_file_path = '../{}_runtime.json'.format(backend)
with open(data_file_path) as data_file:
data = json.load(data_file)
if test_name in data:
mean = float(data[test_name]['mean'])
sigma = float(data[test_name]['sigma'])
else:
# Let the test pass if baseline number doesn't exist
mean = sys.maxsize
sigma = 0.001
print("population mean: ", mean)
print("population sigma: ", sigma)
sample_stats_data = json.loads(args.sample_stats)
sample_mean = sample_stats_data['mean']
sample_sigma = sample_stats_data['sigma']
print("sample mean: ", sample_mean)
print("sample sigma: ", sample_sigma)
z_value = (sample_mean - mean) / sigma
print("z-value: ", z_value)
if z_value >= 3:
raise Exception('''\n
z-value >= 3, there is high chance of perf regression.\n
To reproduce this regression, run `cd .jenkins/pytorch/perf_test/ && bash ''' + test_name + '''.sh` on your local machine and compare the runtime before/after your code change.
''')
else:
print("z-value < 3, no perf regression detected.")
if args.update:
print("We will use these numbers as new baseline.")
new_data_file_path = '../new_{}_runtime.json'.format(backend)
with open(new_data_file_path) as new_data_file:
new_data = json.load(new_data_file)
new_data[test_name] = {}
new_data[test_name]['mean'] = sample_mean
new_data[test_name]['sigma'] = max(sample_sigma, sample_mean * 0.1)
with open(new_data_file_path, 'w') as new_data_file:
json.dump(new_data, new_data_file, indent=4)

View File

@ -0,0 +1,16 @@
import sys
import json
import numpy
sample_data_list = sys.argv[1:]
sample_data_list = [float(v.strip()) for v in sample_data_list]
sample_mean = numpy.mean(sample_data_list)
sample_sigma = numpy.std(sample_data_list)
data = {
'mean': sample_mean,
'sigma': sample_sigma,
}
print(json.dumps(data))

View File

@ -0,0 +1,42 @@
#!/bin/bash
. ./common.sh
test_cpu_speed_mini_sequence_labeler () {
echo "Testing: mini sequence labeler, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 726567a455edbfda6199445922a8cfee82535664
cd scripts/mini_sequence_labeler
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py)
SAMPLE_ARRAY+=(${runtime})
done
cd ../../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
echo "Runtime stats in seconds:"
echo $stats
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_mini_sequence_labeler "$@"
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
. ./common.sh
test_cpu_speed_mnist () {
echo "Testing: MNIST, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/examples.git -b perftests
cd examples/mnist
pip install -r requirements.txt
# Download data
python main.py --epochs 0
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
done
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
echo "Runtime stats in seconds:"
echo $stats
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_mnist "$@"
fi

View File

@ -0,0 +1,28 @@
. ./common.sh
test_cpu_speed_torch () {
echo "Testing: torch.*, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/yf225/perf-tests.git
if [ "$1" == "compare_with_baseline" ]; then
export ARGS="--compare ../cpu_runtime.json"
elif [ "$1" == "compare_and_update" ]; then
export ARGS="--compare ../cpu_runtime.json --update ../new_cpu_runtime.json"
elif [ "$1" == "update_only" ]; then
export ARGS="--update ../new_cpu_runtime.json"
fi
if ! python perf-tests/modules/test_cpu_torch.py ${ARGS}; then
echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash "${FUNCNAME[0]}".sh\` on your local machine and compare the runtime before/after your code change."
exit 1
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_torch "$@"
fi

View File

@ -0,0 +1,28 @@
. ./common.sh
test_cpu_speed_torch_tensor () {
echo "Testing: torch.Tensor.*, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/yf225/perf-tests.git
if [ "$1" == "compare_with_baseline" ]; then
export ARGS="--compare ../cpu_runtime.json"
elif [ "$1" == "compare_and_update" ]; then
export ARGS="--compare ../cpu_runtime.json --update ../new_cpu_runtime.json"
elif [ "$1" == "update_only" ]; then
export ARGS="--update ../new_cpu_runtime.json"
fi
if ! python perf-tests/modules/test_cpu_torch_tensor.py ${ARGS}; then
echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash "${FUNCNAME[0]}".sh\` on your local machine and compare the runtime before/after your code change."
exit 1
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_torch_tensor "$@"
fi

View File

@ -0,0 +1,43 @@
#!/bin/bash
. ./common.sh
test_gpu_speed_cudnn_lstm () {
echo "Testing: CuDNN LSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python cudnn_lstm.py --skip-cpu-governor-check)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
done
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
echo "Runtime stats in seconds:"
echo $stats
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_cudnn_lstm "$@"
fi

View File

@ -0,0 +1,43 @@
#!/bin/bash
. ./common.sh
test_gpu_speed_lstm () {
echo "Testing: LSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python lstm.py --skip-cpu-governor-check)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
done
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
echo "Runtime stats in seconds:"
echo $stats
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_lstm "$@"
fi

View File

@ -0,0 +1,43 @@
#!/bin/bash
. ./common.sh
test_gpu_speed_mlstm () {
echo "Testing: MLSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python mlstm.py --skip-cpu-governor-check)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
done
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
echo "Runtime stats in seconds:"
echo $stats
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_mlstm "$@"
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
. ./common.sh
test_gpu_speed_mnist () {
echo "Testing: MNIST, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/examples.git -b perftests
cd examples/mnist
pip install -r requirements.txt
# Download data
python main.py --epochs 0
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
done
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
echo "Runtime stats in seconds:"
echo $stats
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_mnist "$@"
fi

View File

@ -0,0 +1,52 @@
#!/bin/bash
. ./common.sh
test_gpu_speed_word_language_model () {
echo "Testing: word language model on Wikitext-2, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/examples.git -b perftests
cd examples/word_language_model
cd data/wikitext-2
# Reduce dataset size, so that we can have more runs per test
sed -n '1,200p' test.txt > test_tmp.txt
sed -n '1,1000p' train.txt > train_tmp.txt
sed -n '1,200p' valid.txt > valid_tmp.txt
mv test_tmp.txt test.txt
mv train_tmp.txt train.txt
mv valid_tmp.txt valid.txt
cd ../..
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --cuda --epochs 1)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
done
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
echo "Runtime stats in seconds:"
echo $stats
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_word_language_model "$@"
fi

View File

@ -0,0 +1,13 @@
import sys
import json
data_file_path = sys.argv[1]
commit_hash = sys.argv[2]
with open(data_file_path) as data_file:
data = json.load(data_file)
data['commit'] = commit_hash
with open(data_file_path, 'w') as data_file:
json.dump(data, data_file)

View File

@ -0,0 +1,11 @@
import sys
log_file_path = sys.argv[1]
with open(log_file_path) as f:
lines = f.readlines()
for line in lines:
# Ignore errors from CPU instruction set testing
if 'src.c' not in line:
print(line)

View File

@ -0,0 +1,64 @@
#!/bin/bash
COMPACT_JOB_NAME="short-perf-test-cpu"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
cd .jenkins/pytorch/perf_test
echo "Running CPU perf test for PyTorch..."
pip install awscli
# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read
# More info at https://github.com/aws/aws-cli/issues/2321
aws configure set default.s3.multipart_threshold 5GB
if [[ "$COMMIT_SOURCE" == master ]]; then
# Get current master commit hash
export MASTER_COMMIT_ID=$(git log --format="%H" -n 1)
fi
# Find the master commit to test against
git remote add upstream https://github.com/pytorch/pytorch.git
git fetch upstream
IFS=$'\n'
master_commit_ids=($(git rev-list upstream/master))
for commit_id in "${master_commit_ids[@]}"; do
if aws s3 ls s3://ossci-perf-test/pytorch/cpu_runtime/${commit_id}.json; then
LATEST_TESTED_COMMIT=${commit_id}
break
fi
done
aws s3 cp s3://ossci-perf-test/pytorch/cpu_runtime/${LATEST_TESTED_COMMIT}.json cpu_runtime.json
if [[ "$COMMIT_SOURCE" == master ]]; then
# Prepare new baseline file
cp cpu_runtime.json new_cpu_runtime.json
python update_commit_hash.py new_cpu_runtime.json ${MASTER_COMMIT_ID}
fi
# Include tests
. ./test_cpu_speed_mini_sequence_labeler.sh
. ./test_cpu_speed_mnist.sh
. ./test_cpu_speed_torch.sh
. ./test_cpu_speed_torch_tensor.sh
# Run tests
export TEST_MODE="compare_with_baseline"
if [[ "$COMMIT_SOURCE" == master ]]; then
export TEST_MODE="compare_and_update"
fi
# Operator tests
run_test test_cpu_speed_torch ${TEST_MODE}
run_test test_cpu_speed_torch_tensor ${TEST_MODE}
# Sample model tests
run_test test_cpu_speed_mini_sequence_labeler 20 ${TEST_MODE}
run_test test_cpu_speed_mnist 20 ${TEST_MODE}
if [[ "$COMMIT_SOURCE" == master ]]; then
# This could cause race condition if we are testing the same master commit twice,
# but the chance of them executing this line at the same time is low.
aws s3 cp new_cpu_runtime.json s3://ossci-perf-test/pytorch/cpu_runtime/${MASTER_COMMIT_ID}.json --acl public-read
fi

View File

@ -0,0 +1,68 @@
#!/bin/bash
COMPACT_JOB_NAME="short-perf-test-gpu"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
pushd .jenkins/pytorch/perf_test
echo "Running GPU perf test for PyTorch..."
pip install awscli
# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read
# More info at https://github.com/aws/aws-cli/issues/2321
aws configure set default.s3.multipart_threshold 5GB
if [[ "$COMMIT_SOURCE" == master ]]; then
# Get current master commit hash
export MASTER_COMMIT_ID=$(git log --format="%H" -n 1)
fi
# Find the master commit to test against
git remote add upstream https://github.com/pytorch/pytorch.git
git fetch upstream
IFS=$'\n'
master_commit_ids=($(git rev-list upstream/master))
for commit_id in "${master_commit_ids[@]}"; do
if aws s3 ls s3://ossci-perf-test/pytorch/gpu_runtime/${commit_id}.json; then
LATEST_TESTED_COMMIT=${commit_id}
break
fi
done
aws s3 cp s3://ossci-perf-test/pytorch/gpu_runtime/${LATEST_TESTED_COMMIT}.json gpu_runtime.json
if [[ "$COMMIT_SOURCE" == master ]]; then
# Prepare new baseline file
cp gpu_runtime.json new_gpu_runtime.json
python update_commit_hash.py new_gpu_runtime.json ${MASTER_COMMIT_ID}
fi
# Include tests
. ./test_gpu_speed_mnist.sh
. ./test_gpu_speed_word_language_model.sh
. ./test_gpu_speed_cudnn_lstm.sh
. ./test_gpu_speed_lstm.sh
. ./test_gpu_speed_mlstm.sh
# Run tests
if [[ "$COMMIT_SOURCE" == master ]]; then
run_test test_gpu_speed_mnist 20 compare_and_update
run_test test_gpu_speed_word_language_model 20 compare_and_update
run_test test_gpu_speed_cudnn_lstm 20 compare_and_update
run_test test_gpu_speed_lstm 20 compare_and_update
run_test test_gpu_speed_mlstm 20 compare_and_update
else
run_test test_gpu_speed_mnist 20 compare_with_baseline
run_test test_gpu_speed_word_language_model 20 compare_with_baseline
run_test test_gpu_speed_cudnn_lstm 20 compare_with_baseline
run_test test_gpu_speed_lstm 20 compare_with_baseline
run_test test_gpu_speed_mlstm 20 compare_with_baseline
fi
if [[ "$COMMIT_SOURCE" == master ]]; then
# This could cause race condition if we are testing the same master commit twice,
# but the chance of them executing this line at the same time is low.
aws s3 cp new_gpu_runtime.json s3://ossci-perf-test/pytorch/gpu_runtime/${MASTER_COMMIT_ID}.json --acl public-read
fi
popd

177
.jenkins/pytorch/test.sh Executable file
View File

@ -0,0 +1,177 @@
#!/bin/bash
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
echo "Testing pytorch"
if [ -n "${IN_CIRCLECI}" ]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
sudo apt-get install -y --no-install-recommends openssh-client openssh-server
sudo mkdir -p /var/run/sshd
fi
fi
# JIT C++ extensions require ninja.
git clone https://github.com/ninja-build/ninja --quiet
pushd ninja
python ./configure.py --bootstrap
export PATH="$PWD:$PATH"
popd
# DANGER WILL ROBINSON. The LD_PRELOAD here could cause you problems
# if you're not careful. Check this if you made some changes and the
# ASAN test is not working
if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true
# We suppress the vptr volation, since we have separate copies of
# libprotobuf in both libtorch.so and libcaffe2.so, and it causes
# the following problem:
# test_cse (__main__.TestJit) ... torch/csrc/jit/export.cpp:622:38:
# runtime error: member call on address ... which does not point
# to an object of type 'google::protobuf::MessageLite'
# ...: note: object is of type 'onnx_torch::ModelProto'
#
# This problem should be solved when libtorch.so and libcaffe2.so are
# merged.
export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PWD/ubsan.supp
export PYTORCH_TEST_WITH_ASAN=1
export PYTORCH_TEST_WITH_UBSAN=1
# TODO: Figure out how to avoid hard-coding these paths
export ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-5.0/bin/llvm-symbolizer
export LD_PRELOAD=/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux/libclang_rt.asan-x86_64.so
# Increase stack size, because ASAN red zones use more stack
ulimit -s 81920
function get_exit_code() {
set +e
"$@"
retcode=$?
set -e
return $retcode
}
(cd test && python -c "import torch")
echo "The next three invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured"
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_asan(3)")
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_ubsan(0)")
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
export PYTORCH_TEST_WITH_ROCM=1
fi
if [[ "${JOB_BASE_NAME}" == *-NO_AVX-* ]]; then
export ATEN_CPU_CAPABILITY=default
elif [[ "${JOB_BASE_NAME}" == *-NO_AVX2-* ]]; then
export ATEN_CPU_CAPABILITY=avx
fi
test_python_nn() {
time python test/run_test.py --include nn --verbose
}
test_python_all_except_nn() {
time python test/run_test.py --exclude nn --verbose
}
test_aten() {
# Test ATen
# The following test(s) of ATen have already been skipped by caffe2 in rocm environment:
# scalar_tensor_test, basic, native_test
if ([[ "$BUILD_ENVIRONMENT" != *asan* ]] && [[ "$BUILD_ENVIRONMENT" != *rocm* ]]); then
echo "Running ATen tests with pytorch lib"
TORCH_LIB_PATH=$(python -c "import site; print(site.getsitepackages()[0])")/torch/lib
# NB: the ATen test binaries don't have RPATH set, so it's necessary to
# put the dynamic libraries somewhere were the dynamic linker can find them.
# This is a bit of a hack.
if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
SUDO=sudo
fi
${SUDO} ln -s "$TORCH_LIB_PATH"/libcaffe2* build/bin
${SUDO} ln -s "$TORCH_LIB_PATH"/libnccl* build/bin
ls build/bin
aten/tools/run_tests.sh build/bin
fi
}
test_torchvision() {
rm -rf ninja
echo "Installing torchvision at branch master"
rm -rf vision
# TODO: This git clone is bad, it means pushes to torchvision can break
# PyTorch CI
git clone https://github.com/pytorch/vision --quiet
pushd vision
# python setup.py install with a tqdm dependency is broken in the
# Travis Python nightly (but not in latest Python nightlies, so
# this should be a transient requirement...)
# See https://github.com/pytorch/pytorch/issues/7525
#time python setup.py install
pip install --user .
popd
}
test_libtorch() {
if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
echo "Testing libtorch"
CPP_BUILD="$PWD/../cpp-build"
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
"$CPP_BUILD"/caffe2/bin/test_jit
else
"$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"
fi
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 "$CPP_BUILD"/caffe2/bin/test_api
fi
}
test_custom_script_ops() {
if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
echo "Testing custom script operators"
CUSTOM_OP_BUILD="$PWD/../custom-op-build"
pushd test/custom_operator
cp -r "$CUSTOM_OP_BUILD" build
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
popd
fi
}
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
test_python_nn
test_python_all_except_nn
test_aten
test_torchvision
test_libtorch
test_custom_script_ops
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
test_python_nn
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
test_python_all_except_nn
test_aten
test_torchvision
test_libtorch
test_custom_script_ops
fi
fi

155
.jenkins/pytorch/win-build.sh Executable file
View File

@ -0,0 +1,155 @@
#!/bin/bash
# If you want to rebuild, run this with REBUILD=1
# If you want to build with CUDA, run this with USE_CUDA=1
# If you want to build without CUDA, run this with USE_CUDA=0
if [ ! -f setup.py ]; then
echo "ERROR: Please run this build script from PyTorch root directory."
exit 1
fi
COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-build
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
if [[ ${JOB_NAME} == *"develop"* ]]; then
export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}
fi
mkdir -p ci_scripts/
cat >ci_scripts/upload_image.py << EOL
import os
import sys
import boto3
IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')
session = boto3.session.Session()
s3 = session.resource('s3')
data = open(sys.argv[1], 'rb')
s3.Bucket('ossci-windows-build').put_object(Key='pytorch/'+IMAGE_COMMIT_TAG+'.7z', Body=data)
object_acl = s3.ObjectAcl('ossci-windows-build','pytorch/'+IMAGE_COMMIT_TAG+'.7z')
response = object_acl.put(ACL='public-read')
EOL
cat >ci_scripts/build_pytorch.bat <<EOL
set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%
:: Install MKL
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output mkl.7z
) else (
aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z mkl.7z --quiet
)
7z x -aoa mkl.7z -omkl
)
set CMAKE_INCLUDE_PATH=%cd%\\mkl\\include
set LIB=%cd%\\mkl\\lib;%LIB
:: Install MAGMA
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z --output magma_cuda90_release_mkl_2018.2.185.7z
) else (
aws s3 cp s3://ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z magma_cuda90_release_mkl_2018.2.185.7z --quiet
)
7z x -aoa magma_cuda90_release_mkl_2018.2.185.7z -omagma
)
set MAGMA_HOME=%cd%\\magma
:: Install sccache
mkdir %CD%\\tmp_bin
if "%REBUILD%"=="" (
:check_sccache
%CD%\\tmp_bin\\sccache.exe --show-stats || (
taskkill /im sccache.exe /f /t || ver > nul
del %CD%\\tmp_bin\\sccache.exe
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %CD%\\tmp_bin\\sccache.exe
) else (
aws s3 cp s3://ossci-windows/sccache.exe %CD%\\tmp_bin\\sccache.exe
)
goto :check_sccache
)
)
:: Install Miniconda3
if "%REBUILD%"=="" (
IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )
curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O
.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3
)
call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3
if "%REBUILD%"=="" ( call conda install -y -q numpy cffi pyyaml boto3 )
:: Install ninja
if "%REBUILD%"=="" ( pip install ninja )
call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64
git submodule update --init --recursive
set PATH=%CD%\\tmp_bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%
set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set CUDA_PATH_V9_0=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set NVTOOLSEXT_PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt
set CUDNN_LIB_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\lib\\x64
set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
:: Target only our CI GPU machine's CUDA arch to speed up the build
set TORCH_CUDA_ARCH_LIST=5.2
sccache --stop-server
sccache --start-server
sccache --zero-stats
set CC=sccache cl
set CXX=sccache cl
set DISTUTILS_USE_SDK=1
set CMAKE_GENERATOR=Ninja
if not "%USE_CUDA%"=="1" (
if "%REBUILD%"=="" (
set NO_CUDA=1
python setup.py install
)
if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1
)
if not "%USE_CUDA%"=="0" (
if "%REBUILD%"=="" (
sccache --show-stats
sccache --zero-stats
rd /s /q C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch
copy %CD%\\tmp_bin\\sccache.exe tmp_bin\\nvcc.exe
)
set CUDA_NVCC_EXECUTABLE=%CD%\\tmp_bin\\nvcc
if "%REBUILD%"=="" set NO_CUDA=0
python setup.py install && sccache --show-stats && (
if "%BUILD_ENVIRONMENT%"=="" (
echo "NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3\` in Command Prompt before running Git Bash."
) else (
7z a %IMAGE_COMMIT_TAG%.7z C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z
)
)
)
EOL
ci_scripts/build_pytorch.bat
if [ ! -f $IMAGE_COMMIT_TAG.7z ] && [ ! ${BUILD_ENVIRONMENT} == "" ]; then
exit 1
fi
echo "BUILD PASSED"

93
.jenkins/pytorch/win-test.sh Executable file
View File

@ -0,0 +1,93 @@
#!/bin/bash
COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-test
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
if [[ ${JOB_NAME} == *"develop"* ]]; then
export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}
fi
mkdir -p ci_scripts/
cat >ci_scripts/download_image.py << EOL
import os
import sys
import boto3
import botocore
IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')
session = boto3.session.Session()
s3 = session.resource('s3')
BUCKET_NAME = 'ossci-windows-build'
KEY = 'pytorch/'+IMAGE_COMMIT_TAG+'.7z'
LOCAL_FILE_PATH = sys.argv[1]
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, LOCAL_FILE_PATH)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
EOL
cat >ci_scripts/setup_pytorch_env.bat <<EOL
set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%
:: Install Miniconda3
IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )
curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O
.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3
call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3
call conda install -y -q numpy mkl cffi pyyaml boto3
pip install ninja
call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64
set PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%
set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set CUDA_PATH_V9_0=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set NVTOOLSEXT_PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt
set CUDNN_LIB_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\lib\\x64
set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set PYTHONPATH=%CD%\\test;%PYTHONPATH%
cd test/
python ..\\ci_scripts\\download_image.py %IMAGE_COMMIT_TAG%.7z
7z x %IMAGE_COMMIT_TAG%.7z
cd ..
EOL
cat >ci_scripts/test_python_nn.bat <<EOL
call ci_scripts/setup_pytorch_env.bat
cd test/ && python run_test.py --include nn --verbose && cd ..
EOL
cat >ci_scripts/test_python_all_except_nn.bat <<EOL
call ci_scripts/setup_pytorch_env.bat
cd test/ && python run_test.py --exclude nn --verbose && cd ..
EOL
run_tests() {
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
ci_scripts/test_python_nn.bat && ci_scripts/test_python_all_except_nn.bat
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
ci_scripts/test_python_nn.bat
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
ci_scripts/test_python_all_except_nn.bat
fi
fi
}
run_tests && echo "TEST PASSED"

31
.travis.aten.yml Normal file
View File

@ -0,0 +1,31 @@
# https://travis-ci.org/zdevito/ATen
language: python
python:
- 2.7
- 3.6
dist: trusty
before_install:
- sudo apt-get install -qq valgrind
install:
- travis_retry pip install pyyaml typing
script:
- cd aten
- mkdir build install
- cd build
- cmake .. -DUSE_CUDA=OFF -DCMAKE_INSTALL_PREFIX=../install
- make install
- ../tools/run_tests.sh .
- cd ..
- tools/test_install.sh $(pwd)/install $(pwd)
matrix:
fast_finish: true
include:
env: LINT_CHECK
python: "2.7"
install: pip install flake8
script: flake8

View File

@ -1,27 +1,8 @@
# https://travis-ci.org/pytorch/pytorch
language: python
python:
- 2.7.8
- 2.7
- 3.5
- nightly
install:
- export CC="gcc-4.8"
- export CXX="g++-4.8"
- travis_retry pip install -r requirements.txt
- travis_retry pip install .
script:
- ./test/run_test.sh
addons:
apt:
sources:
- ubuntu-toolchain-r-test
packages:
- gcc-4.8
- g++-4.8
dist: trusty
git:
submodules: false
# This reportedly works around an issue downloading packages from pypi on
# travis. Consider removing this after the underlying issue is fixed.
@ -31,8 +12,20 @@ sudo: false
matrix:
fast_finish: true
include:
env: LINT_CHECK
- env: LINT_CHECK
python: "2.7"
addons: true
install: pip install pep8
script: pep8 setup.py
install: pip install flake8
script: flake8
- env: LINT_CHECK
python: "3.7"
dist: xenial # required for Python 3.7 (travis-ci/travis-ci#9069)
sudo: required # required for Python 3.7 (travis-ci/travis-ci#9069)
install: pip install flake8
script: flake8
- env: MYPY_TYPE_CHECK
python: "3.6"
install: pip install mypy mypy-extensions
script: mypy @mypy-files.txt
- env: CPP_DOC_CHECK
install: sudo apt-get install -y doxygen
script: cd docs/cpp && ./check-doxygen.sh

6
CITATION Normal file
View File

@ -0,0 +1,6 @@
@inproceedings{paszke2017automatic,
title={Automatic differentiation in PyTorch},
author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
booktitle={NIPS-W},
year={2017}
}

421
CMakeLists.txt Normal file
View File

@ -0,0 +1,421 @@
cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
#cmake_policy(SET CMP0022 NEW)
#cmake_policy(SET CMP0023 NEW)
# ---[ Project and semantic versioning.
project(Caffe2 CXX C)
set(CAFFE2_VERSION_MAJOR 0)
set(CAFFE2_VERSION_MINOR 8)
set(CAFFE2_VERSION_PATCH 2)
set(CAFFE2_VERSION
"${CAFFE2_VERSION_MAJOR}.${CAFFE2_VERSION_MINOR}.${CAFFE2_VERSION_PATCH}")
# One variable that determines whether the current cmake process is being run
# with the main Caffe2 library. This is useful for building modules - if
# modules are built with the main Caffe2 library then one does not need to do
# find caffe2 in the cmake script. One can usually guard it in some way like
# if (NOT CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO)
# find_package(Caffe2 REQUIRED)
# endif()
set(CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO ON)
if(NOT DEFINED BLAS_SET_BY_USER)
if(DEFINED BLAS)
set(BLAS_SET_BY_USER TRUE)
else()
message(STATUS "Not forcing any particular BLAS to be found")
set(BLAS_SET_BY_USER FALSE)
endif()
set(BLAS_SET_BY_USER ${BLAS_SET_BY_USER} CACHE STRING "Marks whether BLAS was manually set by user or auto-detected")
endif()
# Apple specific
if(APPLE)
# These lines are an attempt to make find_package(cuda) pick up
# libcuda.dylib, and not cuda.framework. It doesn't work all
# the time, but it seems to help for some users.
# TODO: replace this with a more robust fix
set(CMAKE_FIND_FRAMEWORK LAST)
set(CMAKE_FIND_APPBUNDLE LAST)
# Get clang version on macOS
EXECUTE_PROCESS( COMMAND ${CMAKE_CXX_COMPILER} --version OUTPUT_VARIABLE clang_full_version_string )
string(REGEX REPLACE "Apple LLVM version ([0-9]+\\.[0-9]+).*" "\\1" CLANG_VERSION_STRING ${clang_full_version_string})
MESSAGE( STATUS "CLANG_VERSION_STRING: " ${CLANG_VERSION_STRING} )
# RPATH stuff
set(CMAKE_MACOSX_RPATH ON)
endif()
# ---[ Options.
# Note to developers: if you add an option below, make sure you also add it to
# cmake/Summary.cmake so that the summary prints out the option values.
include(CMakeDependentOption)
option(BUILD_TORCH "Build Torch" OFF)
option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)
option(BUILD_ATEN_MOBILE "Build ATen for Android and iOS" OFF)
option(BUILD_BINARY "Build C++ binaries" OFF)
option(BUILD_DOCS "Build Caffe2 documentation" OFF)
option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)
option(BUILD_PYTHON "Build Python binaries" ON)
option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)
option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)
cmake_dependent_option(
CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON
"BUILD_SHARED_LIBS AND BUILD_CUSTOM_PROTOBUF" OFF)
cmake_dependent_option(
CAFFE2_USE_MSVC_STATIC_RUNTIME "Using MSVC static runtime libraries" ON
"NOT BUILD_SHARED_LIBS" OFF)
option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" OFF)
cmake_dependent_option(
INSTALL_TEST "Install test binaries if BUILD_TEST is on" OFF
"BUILD_TEST" OFF)
option(USE_ACL "Use ARM Compute Library" OFF)
option(USE_ASAN "Use Address Sanitizer" OFF)
option(USE_CUDA "Use CUDA" ON)
option(USE_ROCM "Use ROCm" OFF)
option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)
cmake_dependent_option(
USE_CUDNN "Use cuDNN" ON
"USE_CUDA" OFF)
option(USE_FFMPEG "Use ffmpeg" OFF)
option(USE_GFLAGS "Use GFLAGS" ON)
option(USE_GLOG "Use GLOG" ON)
option(USE_LEVELDB "Use LEVELDB" ON)
option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)
option(USE_LMDB "Use LMDB" ON)
option(USE_METAL "Use Metal for iOS build" ON)
option(USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON)
option(USE_NATIVE_ARCH "Use -march=native" OFF)
option(USE_NCCL "Use NCCL" ON)
option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)
option(USE_NERVANA_GPU "Use Nervana GPU backend" OFF)
option(USE_NNAPI "Use NNAPI" OFF)
option(USE_NNPACK "Use NNPACK" ON)
option(USE_NUMA "Use NUMA (only available on Linux)" ON)
cmake_dependent_option(
USE_NVRTC "Use NVRTC. Only available if USE_CUDA is on." OFF
"USE_CUDA" OFF)
option(USE_OBSERVERS "Use observers module." OFF)
option(USE_OPENCL "Use OpenCL" OFF)
option(USE_OPENCV "Use OpenCV" ON)
option(USE_OPENMP "Use OpenMP for parallel code" OFF)
option(USE_PROF "Use profiling" OFF)
option(USE_REDIS "Use Redis" OFF)
option(USE_ROCKSDB "Use RocksDB" OFF)
option(USE_SNPE "Use Qualcomm's SNPE library" OFF)
option(USE_SYSTEM_EIGEN_INSTALL
"Use system Eigen instead of the one under third_party" OFF)
option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)
option(USE_ZMQ "Use ZMQ" OFF)
option(USE_ZSTD "Use ZSTD" OFF)
option(USE_MKLDNN "Use MKLDNN" OFF)
option(USE_IDEEP "Use IDEEP interface in MKL BLAS" ON)
option(USE_MKLML "Use MKLML interface in MKL BLAS" ON)
option(USE_DISTRIBUTED "Use distributed" ON)
cmake_dependent_option(
USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON
"USE_DISTRIBUTED" OFF)
cmake_dependent_option(
USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON
"USE_DISTRIBUTED" OFF)
cmake_dependent_option(
USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed. Only available if USE_GLOO is on." OFF
"USE_GLOO" OFF)
option(TORCH_USE_CEREAL "Build the C++ API with Cereal for serialization support" OFF)
# Used when building Caffe2 through setup.py
option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" OFF)
SET(ONNX_NAMESPACE "onnx_c2" CACHE STRING "onnx namespace")
if (ANDROID OR IOS)
set(BUILD_ATEN_MOBILE ON)
endif()
# ---[ CMake scripts + modules
list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake/Modules)
# ---[ CMake build directories
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
enable_testing()
# ---[ Build variables set within the cmake tree
include(cmake/BuildVariables.cmake)
set(CAFFE2_WHITELIST "" CACHE STRING "A whitelist file of files that one should build.")
# Set default build type
if(NOT CMAKE_BUILD_TYPE)
message(STATUS "Build type not set - defaulting to Release")
set(CMAKE_BUILD_TYPE "Release" CACHE STRING "Choose the type of build from: Debug Release RelWithDebInfo MinSizeRel Coverage." FORCE)
endif()
# ---[ Misc checks to cope with various compiler modes
include(cmake/MiscCheck.cmake)
# External projects
include(ExternalProject)
# ---[ Utils
# TODO: merge the following 3 files into cmake/public/utils.cmake.
include(cmake/Utils.cmake)
include(cmake/public/utils.cmake)
# ---[ Dependencies
include(cmake/Dependencies.cmake)
# ---[ Whitelist file if whitelist is specified
include(cmake/Whitelist.cmake)
# ---[ Set link flag, handle additional deps for gcc 4.8 and above
if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 4.8.0 AND NOT ANDROID)
message(STATUS "GCC ${CMAKE_CXX_COMPILER_VERSION}: Adding gcc and gcc_s libs to link line")
list(APPEND Caffe2_DEPENDENCY_LIBS gcc_s gcc)
endif()
# ---[ Build flags
set(CMAKE_C_STANDARD 99)
set(CMAKE_CXX_STANDARD 11)
if(NOT MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -fPIC")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing")
# Eigen fails to build with some versions, so convert this to a warning
# Details at http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1459
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-field-initializers")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-type-limits")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-array-bounds")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-pragmas")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-sign-compare")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-parameter")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-variable")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-function")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-result")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-strict-overflow")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-strict-aliasing")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=deprecated-declarations")
if (CMAKE_COMPILER_IS_GNUCXX AND NOT (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.0.0))
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=pedantic")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=redundant-decls")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=old-style-cast")
# These flags are not available in GCC-4.8.5. Set only when using clang.
# Compared against https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Option-Summary.html
if ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-invalid-partial-specialization")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-typedef-redefinition")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-warning-option")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-private-field")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-inconsistent-missing-override")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-aligned-allocation-unavailable")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c++14-extensions")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Qunused-arguments")
endif()
if ((APPLE AND (NOT ("${CLANG_VERSION_STRING}" VERSION_LESS "9.0")))
OR (CMAKE_COMPILER_IS_GNUCXX
AND (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0 AND NOT APPLE)))
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -faligned-new")
endif()
if ($ENV{WERROR})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror")
endif($ENV{WERROR})
if (NOT APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-but-set-variable")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-maybe-uninitialized")
endif()
else()
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if (${CAFFE2_USE_MSVC_STATIC_RUNTIME})
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
else()
if(${flag_var} MATCHES "/MT")
string(REGEX REPLACE "/MT" "/MD" ${flag_var} "${${flag_var}}")
endif()
endif()
# /bigobj increases number of sections in .obj file, which is needed to link
# against libaries in Python 2.7 under Windows
set(${flag_var} "${${flag_var}} /MP /bigobj")
endforeach(flag_var)
endif()
set (CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")
set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")
if (USE_ASAN)
set (CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fsanitize=address")
set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fsanitize=address")
endif()
if (APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-private-field")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c++14-extensions")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")
endif()
if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0.0)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")
endif()
if(ANDROID)
if(CMAKE_COMPILER_IS_GNUCXX)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -s")
else()
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -s")
endif()
endif()
if(NOT APPLE AND UNIX)
list(APPEND Caffe2_DEPENDENCY_LIBS dl)
endif()
# Prefix path to Caffe2 headers.
# If a directory containing installed Caffe2 headers was inadvertently
# added to the list of include directories, prefixing
# PROJECT_SOURCE_DIR means this source tree always takes precedence.
include_directories(BEFORE ${PROJECT_SOURCE_DIR})
# Prefix path to generated Caffe2 headers.
# These need to take precedence over their empty counterparts located
# in PROJECT_SOURCE_DIR.
include_directories(BEFORE ${PROJECT_BINARY_DIR})
include_directories(BEFORE ${PROJECT_SOURCE_DIR}/aten/src/)
# ---[ Main build
add_subdirectory(caffe2)
# --[ Documentation
if(BUILD_DOCS)
# check if Doxygen is installed
find_package(Doxygen)
if (DOXYGEN_FOUND)
message("Generating documentation")
set(DOXYGEN_C_IN ${CMAKE_CURRENT_SOURCE_DIR}/docs/caffe2/.Doxyfile-c)
set(DOXYGEN_C_OUT ${CMAKE_CURRENT_SOURCE_DIR}/docs/caffe2/Doxyfile-c)
set(DOXYGEN_P_IN ${CMAKE_CURRENT_SOURCE_DIR}/docs/caffe2/.Doxyfile-python)
set(DOXYGEN_P_OUT ${CMAKE_CURRENT_SOURCE_DIR}/docs/caffe2/Doxyfile-python)
if(EXISTS ${CMAKE_CURRENT_BINARY_DIR}/docs)
file(REMOVE_RECURSE ${CMAKE_CURRENT_BINARY_DIR}/docs)
endif()
file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/docs)
configure_file(${DOXYGEN_C_IN} ${DOXYGEN_C_OUT} @ONLY)
configure_file(${DOXYGEN_P_IN} ${DOXYGEN_P_OUT} @ONLY)
add_custom_target(doc_doxygen_c ALL
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_C_OUT}
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
COMMENT "Generating C++ API documentation with Doxygen"
VERBATIM)
add_custom_target(doc_doxygen_python ALL
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_P_OUT}
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
COMMENT "Generating Python API documentation with Doxygen"
VERBATIM)
else()
message(FATAL_ERROR "Doxygen needs to be installed to generate the documentation")
endif()
endif()
# ---[ CMake related files
# Uninistall option.
if(NOT TARGET caffe2_uninstall)
configure_file(
${CMAKE_CURRENT_SOURCE_DIR}/cmake/cmake_uninstall.cmake.in
${CMAKE_CURRENT_BINARY_DIR}/cmake_uninstall.cmake
IMMEDIATE @ONLY)
add_custom_target(caffe2_uninstall
COMMAND ${CMAKE_COMMAND} -P
${CMAKE_CURRENT_BINARY_DIR}/cmake_uninstall.cmake)
endif()
# ---[ Make configuration files for cmake to allow dependent libraries
# easier access to Caffe2.
if ((NOT USE_GLOG) OR (NOT USE_GFLAGS) OR BUILD_CUSTOM_PROTOBUF)
message(WARNING
"Generated cmake files are only fully tested if one builds "
"with system glog, gflags, and protobuf. Other settings may "
"generate files that are not well tested.")
endif()
if (USE_CUDA OR USE_ROCM)
# TODO: check if we should include other cuda dependency libraries
# to the interface as well.
endif()
# Note(jiayq): when building static libraries, all PRIVATE dependencies
# will also become interface libraries, and as a result if there are any
# dependency libraries that are not exported, the following install export
# script will fail. As a result, we will only provide the targets cmake
# files for shared lib installation. For more info, read:
# https://cmake.org/pipermail/cmake/2016-May/063400.html
if (BUILD_SHARED_LIBS)
configure_file(
${PROJECT_SOURCE_DIR}/cmake/Caffe2ConfigVersion.cmake.in
${PROJECT_BINARY_DIR}/Caffe2ConfigVersion.cmake
@ONLY)
configure_file(
${PROJECT_SOURCE_DIR}/cmake/Caffe2Config.cmake.in
${PROJECT_BINARY_DIR}/Caffe2Config.cmake
@ONLY)
install(FILES
${PROJECT_BINARY_DIR}/Caffe2ConfigVersion.cmake
${PROJECT_BINARY_DIR}/Caffe2Config.cmake
DESTINATION share/cmake/Caffe2
COMPONENT dev)
install(FILES
${PROJECT_SOURCE_DIR}/cmake/public/cuda.cmake
${PROJECT_SOURCE_DIR}/cmake/public/glog.cmake
${PROJECT_SOURCE_DIR}/cmake/public/gflags.cmake
${PROJECT_SOURCE_DIR}/cmake/public/mkl.cmake
${PROJECT_SOURCE_DIR}/cmake/public/protobuf.cmake
${PROJECT_SOURCE_DIR}/cmake/public/threads.cmake
${PROJECT_SOURCE_DIR}/cmake/public/utils.cmake
DESTINATION share/cmake/Caffe2/public
COMPONENT dev)
install(DIRECTORY
${PROJECT_SOURCE_DIR}/cmake/Modules_CUDA_fix
DESTINATION share/cmake/Caffe2/
COMPONENT dev)
install(EXPORT Caffe2Targets DESTINATION share/cmake/Caffe2
FILE Caffe2Targets.cmake
COMPONENT dev)
else()
message(WARNING
"Generated cmake files are only available when building "
"shared libs.")
endif()
# ---[ Modules
add_subdirectory(modules)
# ---[ Binaries
# Binaries will be built after the Caffe2 main libraries and the modules
# are built. For the binaries, they will be linked to the Caffe2 main
# libraries, as well as all the modules that are built with Caffe2 (the ones
# built in the previous Modules section above).
if (BUILD_BINARY)
add_subdirectory(binaries)
endif()
include(cmake/Summary.cmake)
caffe2_print_configuration_summary()

25
CODEOWNERS Normal file
View File

@ -0,0 +1,25 @@
# This is a comment.
# Each line is a file pattern followed by one or more owners.
/aten/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/aten/src/ATen/core/
/torch/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/docs/source @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ssnl @zou3519
/docs/cpp @goldsborough @ebetica @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/test @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/tools @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/README.md @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/setup.py @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/requirements.txt @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/torch/csrc/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
/test/cpp/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
/torch/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/torch/csrc/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/torch/csrc/jit/passes/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/test/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/scripts/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/torch/lib/c10d/ @apaszke @pietern @teng-li
/torch/csrc/distributed/ @apaszke @pietern @teng-li
/torch/distributed/ @apaszke @pietern @teng-li
/test/test_c10d.py @apaszke @pietern @teng-li
/torch/utils/cpp_extension.py @goldsborough @fmassa @apaszke @soumith @ezyang

379
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,379 @@
## Contributing to PyTorch
If you are interested in contributing to PyTorch, your contributions will fall
into two categories:
1. You want to propose a new Feature and implement it
- post about your intended feature, and we shall discuss the design and
implementation. Once we agree that the plan looks good, go ahead and implement it.
2. You want to implement a feature or bug-fix for an outstanding issue
- Look at the outstanding issues here: https://github.com/pytorch/pytorch/issues
- Especially look at the Low Priority and Medium Priority issues
- Pick an issue and comment on the task that you want to work on this feature
- If you need more context on a particular issue, please ask and we shall provide.
Once you finish implementing a feature or bugfix, please send a Pull Request to
https://github.com/pytorch/pytorch
If you are not familiar with creating a Pull Request, here are some guides:
- http://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request
- https://help.github.com/articles/creating-a-pull-request/
## Developing PyTorch
To develop PyTorch on your machine, here are some tips:
1. Uninstall all existing PyTorch installs:
```
conda uninstall pytorch
pip uninstall torch
pip uninstall torch # run this command twice
```
2. Clone a copy of PyTorch from source:
```
git clone https://github.com/pytorch/pytorch
cd pytorch
```
3. Install PyTorch in `build develop` mode:
A full set of instructions on installing PyTorch from Source are here:
https://github.com/pytorch/pytorch#from-source
The change you have to make is to replace
```
python setup.py install
```
with
```
python setup.py build develop
```
This is especially useful if you are only changing Python files.
This mode will symlink the python files from the current local source tree into the
python install.
Hence, if you modify a python file, you do not need to reinstall pytorch again and again.
For example:
- Install local pytorch in `build develop` mode
- modify your python file `torch/__init__.py` (for example)
- test functionality
- modify your python file `torch/__init__.py`
- test functionality
- modify your python file `torch/__init__.py`
- test functionality
You do not need to repeatedly install after modifying python files.
In case you want to reinstall, make sure that you uninstall pytorch first by running `pip uninstall torch`
and `python setup.py clean`. Then you can install in `build develop` mode again.
## Unit testing
PyTorch's testing is located under `test/`. Run the entire test suite with
```
python test/run_test.py
```
or run individual test files, like `python test/test_nn.py`, for individual test suites.
### Better local unit tests with pytest
We don't officially support `pytest`, but it works well with our `unittest` tests and offers
a number of useful features for local developing. Install it via `pip install pytest`.
If you want to just run tests that contain a specific substring, you can use the `-k` flag:
```
pytest test/test_nn.py -k Loss -v
```
The above is an example of testing a change to Loss functions: this command runs tests such as
`TestNN.test_BCELoss` and `TestNN.test_MSELoss` and can be useful to save keystrokes.
## Writing documentation
PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to
fit into Jupyter documentation popups.
For C++ documentation (https://pytorch.org/cppdocs), we use
[Doxygen](http://www.doxygen.nl/) and then convert it to
[Sphinx](http://www.sphinx-doc.org/) via
[Breathe](https://github.com/michaeljones/breathe) and
[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen
reference](http://www.stack.nl/~dimitri/doxygen/manual/index.html) for more
information on the documentation syntax. To build the documentation locally,
`cd` into `docs/cpp` and then `make html`.
We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen
commands. To run this check locally, run `./check-doxygen.sh` from inside
`docs/cpp`.
## Managing multiple build trees
One downside to using `python setup.py develop` is that your development
version of pytorch will be installed globally on your account (e.g., if
you run `import torch` anywhere else, the development version will be
used.
If you want to manage multiple builds of PyTorch, you can make use of
[conda environments](https://conda.io/docs/using/envs.html) to maintain
separate Python package environments, each of which can be tied to a
specific build of PyTorch. To set one up:
```
conda create -n pytorch-myfeature
source activate pytorch-myfeature
# if you run python now, torch will NOT be installed
python setup.py build develop
```
## C++ Development tips
If you are working on the C++ code, there are a few important things that you
will want to keep in mind:
1. How to rebuild only the code you are working on, and
2. How to make rebuilds in the absence of changes go faster.
### Build only what you need.
`python setup.py build` will build everything, but since our build system is
not very optimized for incremental rebuilds, this will actually be very slow.
Far better is to only request rebuilds of the parts of the project you are
working on:
- Working on the Python bindings? Run `python setup.py develop` to rebuild
(NB: no `build` here!)
- Working on `torch/csrc` or `aten`? Run `python setup.py rebuild_libtorch` to
rebuild and avoid having to rebuild other dependent libraries we
depend on.
- Working on one of the other dependent libraries? The other valid
targets are listed in `dep_libs` in `setup.py`. prepend `build_` to
get a target, and run as e.g. `python setup.py build_gloo`.
- Working on a test binary? Run `(cd build && ninja bin/test_binary_name)` to
rebuild only that test binary (without rerunning cmake). (Replace `ninja` with
`make` if you don't have ninja installed).
On the initial build, you can also speed things up with the environment
variables `DEBUG` and `NO_CUDA`.
- `DEBUG=1` will enable debug builds (-g -O0)
- `NO_CUDA=1` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
For example:
```
NO_CUDA=1 DEBUG=1 python setup.py build develop
```
Make sure you continue to pass these flags on subsequent builds.
### Code completion and IDE support
When using `python setup.py develop`, PyTorch will generate
a `compile_commands.json` file that can be used by many editors
to provide command completion and error highlighting for PyTorch's
C++ code. You need to `pip install ninja` to generate accurate
information for the code in `torch/csrc`. More information at:
- https://sarcasm.github.io/notes/dev/compilation-database.html
### Make no-op build fast.
#### Use Ninja
Python `setuptools` is pretty dumb, and always rebuilds every C file in a
project. If you install the ninja build system with `pip install ninja`,
then PyTorch will use it to track dependencies correctly.
If pytorch was already built, you will need to run `python setup.py clean` once
after installing ninja for builds to succeed.
#### Use CCache
Even when dependencies are tracked with file modification,
there are many situations where files get rebuilt when a previous
compilation was exactly the same.
Using ccache in a situation like this is a real time-saver. However, by
default, ccache does not properly support CUDA stuff, so here are the
instructions for installing a custom `ccache` fork that has CUDA support:
```
# install and export ccache
if ! ls ~/ccache/bin/ccache
then
sudo apt-get update
sudo apt-get install -y automake autoconf
sudo apt-get install -y asciidoc
mkdir -p ~/ccache
pushd /tmp
rm -rf ccache
git clone https://github.com/colesbury/ccache -b ccbin
pushd ccache
./autogen.sh
./configure
make install prefix=~/ccache
popd
popd
mkdir -p ~/ccache/lib
mkdir -p ~/ccache/cuda
ln -s ~/ccache/bin/ccache ~/ccache/lib/cc
ln -s ~/ccache/bin/ccache ~/ccache/lib/c++
ln -s ~/ccache/bin/ccache ~/ccache/lib/gcc
ln -s ~/ccache/bin/ccache ~/ccache/lib/g++
ln -s ~/ccache/bin/ccache ~/ccache/cuda/nvcc
~/ccache/bin/ccache -M 25Gi
fi
export PATH=~/ccache/lib:$PATH
export CUDA_NVCC_EXECUTABLE=~/ccache/cuda/nvcc
```
## CUDA Development tips
If you are working on the CUDA code, here are some useful CUDA debugging tips:
1. `CUDA_DEVICE_DEBUG=1` will enable CUDA device function debug symbols (`-g -G`).
This will be particularly helpful in debugging device code. However, it will
slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.
2. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,
`cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).
Hope this helps, and thanks for considering to contribute.
## Windows development tips
Occasionally, you will write a patch which works on Linux, but fails CI on Windows.
There are a few aspects in which MSVC (the Windows compiler toolchain we use) is stricter
than Linux, which are worth keeping in mind when fixing these problems.
1. Symbols are NOT exported by default on Windows; instead, you have to explicitly
mark a symbol as exported/imported in a header file with `__declspec(dllexport)` /
`__declspec(dllimport)`. We have codified this pattern into a set of macros
which follow the convention `*_API`, e.g., `AT_API` inside ATen. (Every separate
shared library needs a unique macro name, because symbol visibility is on a per
shared library basis.)
The upshot is if you see an "unresolved external" error in your Windows build, this
is probably because you forgot to mark a function with `*_API`. However, there is
one important counterexample to this principle: if you want a *templated* function
to be instantiated at the call site, do NOT mark it with `*_API` (if you do mark it,
you'll have to explicitly instantiate all of the specializations used by the call
sites.)
2. If you link against a library, this does not make its dependencies transitively
visible. You must explicitly specify a link dependency against every library whose
symbols you use. (This is different from Linux where in most environments,
transitive dependencies can be used to fulfill unresolved symbols.)
3. If you have a Windows box (we have a few on EC2 which you can request access to) and
you want to run the build, the easiest way is to just run `.jenkins/pytorch/win-build.sh`.
If you need to rebuild, run `REBUILD=1 .jenkins/pytorch/win-build.sh` (this will avoid
blowing away your Conda environment.)
Even if you don't know anything about MSVC, you can use cmake to build simple programs on
Windows; this can be helpful if you want to learn more about some peculiar linking behavior
by reproducing it on a small example. Here's a simple example cmake file that defines
two dynamic libraries, one linking with the other:
```
project(myproject CXX)
set(CMAKE_CXX_STANDARD 11)
add_library(foo SHARED foo.cpp)
add_library(bar SHARED bar.cpp)
# NB: don't forget to __declspec(dllexport) at least one symbol from foo,
# otherwise foo.lib will not be created.
target_link_libraries(bar PUBLIC foo)
```
You can build it with:
```
mkdir build
cd build
cmake ..
cmake --build .
```
### Known MSVC (and MSVC with NVCC) bugs
The PyTorch codebase sometimes likes to use exciting C++ features, and
these exciting features lead to exciting bugs in Windows compilers.
To add insult to injury, the error messages will often not tell you
which line of code actually induced the erroring template instantiation.
I've found the most effective way to debug these problems is to
carefully read over diffs, keeping in mind known bugs in MSVC/NVCC.
Here are a few well known pitfalls and workarounds:
* This is not actually a bug per se, but in general, code generated by MSVC
is more sensitive to memory errors; you may have written some code
that does a use-after-free or stack overflows; on Linux the code
might work, but on Windows your program will crash. ASAN may not
catch all of these problems: stay vigilant to the possibility that
your crash is due to a real memory problem.
* (NVCC) `at::optional` does not work when used from device code. Don't use
it from kernels. Upstream issue: https://github.com/akrzemi1/Optional/issues/58
and our local issue #10329.
* `constexpr` generally works less well on MSVC.
* The idiom `static_assert(f() == f())` to test if `f` is constexpr
does not work; you'll get "error C2131: expression did not evaluate
to a constant". Don't use these asserts on Windows.
(Example: `aten/src/ATen/core/intrusive_ptr.h`)
* (NVCC) Code you access inside a `static_assert` will eagerly be
evaluated as if it were device code, and so you might get an error
that the code is "not accessible".
```
class A {
static A singleton_;
static constexpr inline A* singleton() {
return &singleton_;
}
};
static_assert(std::is_same(A*, decltype(A::singelton()))::value, "hmm");
```
* The compiler will run out of heap if you attempt to compile files that
are too large. Splitting such files into separate files helps.
(Example: `THTensorMath`, `THTensorMoreMath`, `THTensorEvenMoreMath`.)
## Caffe2 notes
In 2018, we merged Caffe2 into the PyTorch source repository. While the
steady state aspiration is that Caffe2 and PyTorch share code freely,
in the meantime there will be some separation.
If you submit a PR to only PyTorch or only Caffe2 code, CI will only
run for the project you edited. The logic for this is implemented
in `.jenkins/pytorch/dirty.sh` and `.jenkins/caffe2/dirty.sh`; you
can look at this to see what path prefixes constitute changes.
This also means if you ADD a new top-level path, or you start
sharing code between projects, you need to modify these files.
There are a few "unusual" directories which, for historical reasons,
are Caffe2/PyTorch specific. Here they are:
- `CMakeLists.txt`, `Makefile`, `binaries`, `cmake`, `conda`, `modules`,
`scripts` are Caffe2-specific. Don't put PyTorch code in them without
extra coordination.
- `mypy*`, `requirements.txt`, `setup.py`, `test`, `tools` are
PyTorch-specific. Don't put Caffe2 code in them without extra
coordination.

View File

@ -1,33 +0,0 @@
FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
curl \
ca-certificates \
libjpeg-dev \
libpng-dev &&\
rm -rf /var/lib/apt/lists/*
RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda install conda-build && \
/opt/conda/bin/conda create -y --name pytorch-py35 python=3.5.2 numpy scipy ipython mkl&& \
/opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/envs/pytorch-py35/bin:$PATH
RUN conda install --name pytorch-py35 -c soumith magma-cuda80
# This must be done before pip so that requirements.txt is available
WORKDIR /opt/pytorch
COPY . .
RUN cat requirements.txt | xargs -n1 pip install --no-cache-dir && \
TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_LIBRARY_PATH=/opt/conda/envs/pytorch-py35/lib \
CMAKE_INCLUDE_PATH=/opt/conda/envs/pytorch-py35/include \
pip install -v .
WORKDIR /workspace
RUN chmod -R a+w /workspace

32
LICENSE
View File

@ -1,3 +1,5 @@
From PyTorch:
Copyright (c) 2016- Facebook, Inc (Adam Paszke)
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
@ -8,6 +10,36 @@ Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou,
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
From Caffe2:
Copyright (c) 2016-present, Facebook Inc. All rights reserved.
All contributions by Facebook:
Copyright (c) 2016 Facebook Inc.
All contributions by Google:
Copyright (c) 2015 Google Inc.
All rights reserved.
All contributions by Yangqing Jia:
Copyright (c) 2015 Yangqing Jia
All rights reserved.
All contributions from Caffe:
Copyright(c) 2013, 2014, 2015, the respective contributors
All rights reserved.
All other contributions:
Copyright(c) 2015, 2016 the respective contributors
All rights reserved.
Caffe2 uses a copyright model similar to Caffe: each contributor holds
copyright over their contributions to Caffe2. The project versioning records
all such contribution and copyright details. If a contributor wants to further
mark their specific copyright on a particular contribution, they should
indicate their copyright solely in the commit message of the change when it is
committed.
All rights reserved.
Redistribution and use in source and binary forms, with or without

21
Makefile Normal file
View File

@ -0,0 +1,21 @@
# This makefile does nothing but delegating the actual building to cmake.
all:
@mkdir -p build && cd build && cmake .. $(shell python ./scripts/get_python_cmake_flags.py) && $(MAKE)
local:
@./scripts/build_local.sh
android:
@./scripts/build_android.sh
ios:
@./scripts/build_ios.sh
clean: # This will remove ALL build folders.
@rm -r build*/
linecount:
@cloc --read-lang-def=caffe.cloc caffe2 || \
echo "Cloc is not available on the machine. You can install cloc with " && \
echo " sudo apt-get install cloc"

309
NOTICE Normal file
View File

@ -0,0 +1,309 @@
=======================================================================
Software under third_party
=======================================================================
Software libraries under third_party are provided as github submodule
links, and their content is not part of the Caffe2 codebase. Their
licences can be found under the respective software repositories.
=======================================================================
Earlier BSD License
=======================================================================
Early development of Caffe2 in 2015 and early 2016 is licensed under the
BSD license. The license is attached below:
All contributions by Facebook:
Copyright (c) 2016 Facebook Inc.
All contributions by Google:
Copyright (c) 2015 Google Inc.
All rights reserved.
All contributions by Yangqing Jia:
Copyright (c) 2015 Yangqing Jia
All rights reserved.
All other contributions:
Copyright(c) 2015, 2016 the respective contributors
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
=======================================================================
Caffe's BSD License
=======================================================================
Some parts of the caffe2 code is derived from the original Caffe code, which is
created by Yangqing Jia and is now a BSD-licensed open-source project. The Caffe
license is as follows:
COPYRIGHT
All contributions by the University of California:
Copyright (c) 2014, The Regents of the University of California (Regents)
All rights reserved.
All other contributions:
Copyright (c) 2014, the respective contributors
All rights reserved.
Caffe uses a shared copyright model: each contributor holds copyright over
their contributions to Caffe. The project versioning records all such
contribution and copyright details. If a contributor wants to further mark
their specific copyright on a particular contribution, they should indicate
their copyright solely in the commit message of the change when it is
committed.
LICENSE
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
CONTRIBUTION AGREEMENT
By contributing to the BVLC/caffe repository through pull-request, comment,
or otherwise, the contributor releases their content to the
license and copyright terms herein.
=======================================================================
Caffe2's Apache License
=======================================================================
This repo contains Caffe2 code, which was previously licensed under
Apache License Version 2.0:
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

226
README.md
View File

@ -1,229 +1,275 @@
<p align="center"><img width="40%" src="docs/source/_static/img/pytorch-logo-dark.png" /></p>
![PyTorch Logo](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/pytorch-logo-dark.png)
--------------------------------------------------------------------------------
PyTorch is a python package that provides two high-level features:
- Tensor computation (like numpy) with strong GPU acceleration
- Deep Neural Networks built on a tape-based autograd system
PyTorch is a Python package that provides two high-level features:
- Tensor computation (like NumPy) with strong GPU acceleration
- Deep neural networks built on a tape-based autograd system
You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed.
You can reuse your favorite Python packages such as NumPy, SciPy and Cython to extend PyTorch when needed.
We are in an early-release Beta. Expect some adventures and rough edges.
We are in an early-release beta. Expect some adventures and rough edges.
- [More About PyTorch](#more-about-pytorch)
- [More about PyTorch](#more-about-pytorch)
- [Installation](#installation)
- [Binaries](#binaries)
- [From source](#from-source)
- [Docker image](#docker-image)
- [From Source](#from-source)
- [Docker Image](#docker-image)
- [Building the Documentation](#building-the-documentation)
- [Previous Versions](#previous-versions)
- [Getting Started](#getting-started)
- [Communication](#communication)
- [Releases and Contributing](#releases-and-contributing)
- [The Team](#the-team)
| System | Python | Status |
| System | 2.7 | 3.5 |
| --- | --- | --- |
| Linux CPU | 2.7.8, 2.7, 3.5, nightly | [![Build Status](https://travis-ci.org/pytorch/pytorch.svg?branch=master)](https://travis-ci.org/pytorch/pytorch) |
| Linux GPU | 2.7 | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py2)](https://build.pytorch.org/job/pytorch-master-py2) |
| Linux GPU | 3.5 | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py3)](https://build.pytorch.org/job/pytorch-master-py3) |
| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |
| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |
| Windows GPU | <center></center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/)
See also the [ci.pytorch.org HUD](https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master).
## More about PyTorch
At a granular level, PyTorch is a library that consists of the following components:
| \_ | \_ |
| ------------------------ | --- |
| torch | a Tensor library like NumPy, with strong GPU support |
| torch.autograd | a tape based automatic differentiation library that supports all differentiable Tensor operations in torch |
| torch.nn | a neural networks library deeply integrated with autograd designed for maximum flexibility |
| torch.optim | an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc. |
| torch.multiprocessing | python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and hogwild training. |
| torch.utils | DataLoader, Trainer and other utility functions for convenience |
| torch.legacy(.nn/.optim) | legacy code that has been ported over from torch for backward compatibility reasons |
| Component | Description |
| ---- | --- |
| **torch** | a Tensor library like NumPy, with strong GPU support |
| **torch.autograd** | a tape-based automatic differentiation library that supports all differentiable Tensor operations in torch |
| **torch.nn** | a neural networks library deeply integrated with autograd designed for maximum flexibility |
| **torch.multiprocessing** | Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training |
| **torch.utils** | DataLoader, Trainer and other utility functions for convenience |
| **torch.legacy(.nn/.optim)** | legacy code that has been ported over from torch for backward compatibility reasons |
Usually one uses PyTorch either as:
- A replacement for numpy to use the power of GPUs.
- a replacement for NumPy to use the power of GPUs.
- a deep learning research platform that provides maximum flexibility and speed
Elaborating further:
### A GPU-ready Tensor library
### A GPU-Ready Tensor Library
If you use numpy, then you have used Tensors (a.k.a ndarray).
If you use NumPy, then you have used Tensors (a.k.a ndarray).
<p align=center><img width="30%" src="docs/source/_static/img/tensor_illustration.png" /></p>
![Tensor illustration](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/tensor_illustration.png)
PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerate
compute by a huge amount.
PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerates the
computation by a huge amount.
We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs
such as slicing, indexing, math operations, linear algebra, reductions.
And they are fast!
### Dynamic Neural Networks: Tape based Autograd
### Dynamic Neural Networks: Tape-Based Autograd
PyTorch has a unique way of building neural networks: using and replaying a tape recorder.
Most frameworks such as `TensorFlow`, `Theano`, `Caffe` and `CNTK` have a static view of the world.
Most frameworks such as TensorFlow, Theano, Caffe and CNTK have a static view of the world.
One has to build a neural network, and reuse the same structure again and again.
Changing the way the network behaves means that one has to start from scratch.
With PyTorch, we use a technique called Reverse-mode auto-differentiation, which allows you to
With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to
change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes
from several research papers on this topic, as well as current and past work such as
[autograd](https://github.com/twitter/torch-autograd),
[torch-autograd](https://github.com/twitter/torch-autograd),
[autograd](https://github.com/HIPS/autograd),
[Chainer](http://chainer.org), etc.
While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date.
You get the best of speed and flexibility for your crazy research.
<p align=center><img width="80%" src="docs/source/_static/img/dynamic_graph.gif" /></p>
![Dynamic graph](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/dynamic_graph.gif)
### Python first
### Python First
PyTorch is not a Python binding into a monolothic C++ framework.
PyTorch is not a Python binding into a monolithic C++ framework.
It is built to be deeply integrated into Python.
You can use it naturally like you would use numpy / scipy / scikit-learn etc.
You can use it naturally like you would use NumPy / SciPy / scikit-learn etc.
You can write your new neural network layers in Python itself, using your favorite libraries
and use packages such as Cython and Numba.
Our goal is to not reinvent the wheel where appropriate.
### Imperative experiences
### Imperative Experiences
PyTorch is designed to be intuitive, linear in thought and easy to use.
When you execute a line of code, it gets executed. There isn't an asynchronous view of the world.
When you drop into a debugger, or receive error messages and stack traces, understanding them is straight-forward.
The stack-trace points to exactly where your code was defined.
When you drop into a debugger, or receive error messages and stack traces, understanding them is straightforward.
The stack trace points to exactly where your code was defined.
We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.
### Fast and Lean
PyTorch has minimal framework overhead. We integrate acceleration libraries
such as Intel MKL and NVIDIA (CuDNN, NCCL) to maximize speed.
At the core, it's CPU and GPU Tensor and Neural Network backends
(TH, THC, THNN, THCUNN) are written as independent libraries with a C99 API.
They are mature and have been tested for years.
PyTorch has minimal framework overhead. We integrate acceleration libraries
such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed.
At the core, its CPU and GPU Tensor and neural network backends
(TH, THC, THNN, THCUNN) are mature and have been tested for years.
Hence, PyTorch is quite fast -- whether you run small or large neural networks.
Hence, PyTorch is quite fast whether you run small or large neural networks.
The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives.
We've written custom memory allocators for the GPU to make sure that
your deep learning models are maximally memory efficient.
This enables you to train bigger deep learning models than before.
### Extensions without pain
### Extensions without Pain
Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straight-forward
Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward
and with minimal abstractions.
You can write new neural network layers in Python using the torch API
[or your favorite numpy based libraries such as SciPy](https://github.com/pytorch/tutorials/blob/master/Creating%20extensions%20using%20numpy%20and%20scipy.ipynb).
[or your favorite NumPy-based libraries such as SciPy](http://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).
If you want to write your layers in C/C++, we provide an extension API based on
[cffi](http://cffi.readthedocs.io/en/latest/) that is efficient and with minimal boilerplate.
There is no wrapper code that needs to be written. [You can see an example here](https://github.com/pytorch/extension-ffi).
If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate.
There is no wrapper code that needs to be written. You can see [a tutorial here](http://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).
## Installation
### Binaries
- Anaconda
```bash
conda install pytorch torchvision -c soumith
```
Commands to install from binaries via Conda or pip wheels are on our website:
### From source
[http://pytorch.org](http://pytorch.org)
### From Source
If you are installing from source, we highly recommend installing an [Anaconda](https://www.continuum.io/downloads) environment.
You will get a high-quality BLAS library (MKL) and you get a controlled compiler version regardless of your Linux distro.
Once you have [anaconda](https://www.continuum.io/downloads) installed, here are the instructions.
Once you have [Anaconda](https://www.continuum.io/downloads) installed, here are the instructions.
If you want to compile with CUDA support, install
- [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 7.5 or above
- [NVIDIA CuDNN](https://developer.nvidia.com/cudnn) v5.x
- [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) v6.x or above
If you want to disable CUDA support, export environment variable `NO_CUDA=1`.
Other potentially useful environment variables may be found in `setup.py`.
If you want to build on Windows, Visual Studio 2017 14.11 toolset and NVTX are also needed.
Especially, for CUDA 8 build on Windows, there will be an additional requirement for VS 2015 Update 3 and a patch for it.
The details of the patch can be found out [here](https://support.microsoft.com/en-gb/help/4020481/fix-link-exe-crashes-with-a-fatal-lnk1000-error-when-you-use-wholearch).
#### Install optional dependencies
On Linux
```bash
export CMAKE_PREFIX_PATH=[anaconda root directory]
export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]
# Install basic dependencies
conda install numpy mkl setuptools cmake gcc cffi
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c mingfeima mkldnn
# Add LAPACK support for the GPU
conda install -c soumith magma-cuda75 # or magma-cuda80 if CUDA 8.0
conda install -c pytorch magma-cuda80 # or magma-cuda90 if CUDA 9
```
On OSX
On macOS
```bash
export CMAKE_PREFIX_PATH=[anaconda root directory]
conda install numpy setuptools cmake cffi
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
```
On Windows
```cmd
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
```
#### Get the PyTorch source
```bash
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
```
#### Install PyTorch
On Linux
```bash
export MACOSX_DEPLOYMENT_TARGET=10.9 # if OSX
pip install -r requirements.txt
python setup.py install
```
On macOS
```bash
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
```
On Windows
```cmd
set "VS150COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build"
set CMAKE_GENERATOR=Visual Studio 15 2017 Win64
set DISTUTILS_USE_SDK=1
REM The following two lines are needed for Python 2.7, but the support for it is very experimental.
set MSSdk=1
set FORCE_PY27_BUILD=1
REM As for CUDA 8, VS2015 Update 3 is also required to build PyTorch. Use the following line.
set "CUDA_HOST_COMPILER=%VS140COMNTOOLS%\..\..\VC\bin\amd64\cl.exe"
call "%VS150COMNTOOLS%\vcvarsall.bat" x64 -vcvars_ver=14.11
python setup.py install
```
### Docker image
Dockerfiles are supplied to build images with cuda support and cudnn v5 and cudnn v6 RC. Build them as usual
Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass -e PYTHON_VERSION=x.y flag to specificy which python to be used by Miniconda, or leave it unset to use the default. Build as usual
```
docker build . -t pytorch-cudnnv5
docker build -t pytorch -f docker/pytorch/Dockerfile .
```
or
You can also pull a pre-built docker image from Docker Hub and run with nvidia-docker,
but this is not currently maintained and will pull PyTorch 0.2.
```
docker build . -t pytorch-cudnnv6 -f tools/docker/Dockerfile-v6
nvidia-docker run --rm -ti --ipc=host pytorch/pytorch:latest
```
and run them with nvidia-docker:
```
nvidia-docker run --rm -ti --ipc=host pytorch-cudnnv5
```
Please note that pytorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.
for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you
should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.
should increase shared memory size either with `--ipc=host` or `--shm-size` command line options to `nvidia-docker run`.
### Building the Documentation
To build documentation in various formats, you will need Sphinx and the
readthedocs theme.
```
cd docs/
pip install -r requirements.txt
```
You can then build the documentation by running ``make <format>`` from the
``docs/`` folder. Run ``make`` to get a list of all available output formats.
### Previous Versions
Installation instructions and binaries for previous PyTorch versions may be found
on [our website](http://pytorch.org/previous-versions/).
## Getting Started
Three pointers to get you started:
- [Tutorials: notebooks to get you started with understanding and using PyTorch](https://github.com/pytorch/tutorials)
- [Tutorials: get you started with understanding and using PyTorch](https://pytorch.org/tutorials/)
- [Examples: easy to understand pytorch code across all domains](https://github.com/pytorch/examples)
- The API Reference: [http://pytorch.org/docs/](http://pytorch.org/docs/)
- [The API Reference](http://pytorch.org/docs/)
## Communication
* forums: discuss implementations, research, etc. http://discuss.pytorch.org
* github issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
* slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . If you need a slack invite, ping us at soumith@pytorch.org
* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
* Slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . Our slack channel is invite-only to promote a healthy balance between power-users and beginners. If you need a slack invite, ping us at slack@pytorch.org
* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: http://eepurl.com/cbG0rv
## Releases and Contributing
PyTorch has a 90 day release cycle (major releases).
It's current state is Beta (v0.1.6), we expect no obvious bugs. Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).
PyTorch has a 90 day release cycle (major releases).
Its current state is Beta, we expect no obvious bugs. Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).
We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.
If you plan to contribute new features, utility functions or extensions to the core, please first open an issue and discuss the feature with us.
Sending a PR without discussion might end up resulting in a rejected PR, because we might be taking the core in a different direction than you might be aware of.
**For the next release cycle, these are the 3 big features we are planning to add:**
1. [Distributed PyTorch](https://github.com/pytorch/pytorch/issues/241) (a draft implementation is present in this [branch](https://github.com/apaszke/pytorch-dist) )
2. Backward of Backward - Backpropagating through the optimization process itself. Some past and recent papers such as
[Double Backprop](http://yann.lecun.com/exdb/publis/pdf/drucker-lecun-91.pdf) and [Unrolled GANs](https://arxiv.org/abs/1611.02163) need this.
3. Lazy Execution Engine for autograd - This will enable us to optionally introduce caching and JIT compilers to optimize autograd code.
## The Team
PyTorch is a community driven project with several skillful engineers and researchers contributing to it.
PyTorch is currently maintained by [Adam Paszke](https://apaszke.github.io/), [Sam Gross](https://github.com/colesbury) and [Soumith Chintala](http://soumith.ch) with major contributions coming from 10s of talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Sergey Zagoruyko, Adam Lerer, Francisco Massa, Andreas Kopf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein.
PyTorch is currently maintained by [Adam Paszke](https://apaszke.github.io/), [Sam Gross](https://github.com/colesbury), [Soumith Chintala](http://soumith.ch) and [Gregory Chanan](https://github.com/gchanan) with major contributions coming from 10s of talented individuals in various forms and means.
A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Kopf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.
Note: this project is unrelated to [hughperkins/pytorch](https://github.com/hughperkins/pytorch) with the same name. Hugh is a valuable contributor in the Torch community and has helped with many things Torch and PyTorch.

3
aten/.flake8 Normal file
View File

@ -0,0 +1,3 @@
[flake8]
max-line-length = 120

3
aten/.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
__pycache__/
build/
*.pyc

105
aten/CMakeLists.txt Normal file
View File

@ -0,0 +1,105 @@
if (BUILD_ATEN_MOBILE)
return()
endif()
# Find modules
list(APPEND CMAKE_MODULE_PATH
/usr/lib/x86_64-linux-gnu/
${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules
${CMAKE_CURRENT_SOURCE_DIR}/../cmake/public
${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules_CUDA_fix)
list(APPEND CMAKE_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/)
cmake_policy(SET CMP0012 NEW)
#############################################
set(ATen_CPU_SRCS)
set(ATen_CPU_TEST_SRCS)
set(ATen_CPU_INCLUDE)
set(ATen_THIRD_PARTY_INCLUDE)
set(ATen_CUDA_SRCS)
set(ATen_CUDA_TEST_SRCS)
set(ATen_CUDA_INCLUDE)
set(ATen_CPU_DEPENDENCY_LIBS)
set(ATen_CUDA_DEPENDENCY_LIBS)
set(ATen_PUBLIC_CUDA_DEPENDENCY_LIBS)
SET(ATEN_INSTALL_BIN_SUBDIR "bin" CACHE PATH "ATen install binary subdirectory")
SET(ATEN_INSTALL_LIB_SUBDIR "lib" CACHE PATH "ATen install library subdirectory")
SET(ATEN_INSTALL_INCLUDE_SUBDIR "include" CACHE PATH "ATen install include subdirectory")
if(USE_CUDA)
list(APPEND ATen_CUDA_INCLUDE ${CUDA_INCLUDE_DIRS})
endif()
set(TH_LINK_STYLE STATIC)
add_subdirectory(src/TH)
set(TH_CPU_INCLUDE
# dense
${CMAKE_CURRENT_SOURCE_DIR}/src/TH
${CMAKE_CURRENT_BINARY_DIR}/src/TH
${CMAKE_CURRENT_SOURCE_DIR}/src
${CMAKE_CURRENT_BINARY_DIR}/src
${CMAKE_BINARY_DIR}/aten/src)
list(APPEND ATen_CPU_INCLUDE ${TH_CPU_INCLUDE})
if(USE_CUDA OR USE_ROCM)
set(TH_CUDA_INCLUDE
# dense
${CMAKE_CURRENT_SOURCE_DIR}/src/THC
${CMAKE_CURRENT_BINARY_DIR}/src/THC)
list(APPEND ATen_CUDA_INCLUDE ${TH_CUDA_INCLUDE})
endif()
add_subdirectory(src/THNN)
# Find the HIP package, set the HIP paths, load the HIP CMake.
IF(USE_ROCM)
include(LoadHIP)
if (NOT PYTORCH_FOUND_HIP)
MESSAGE(FATAL_ERROR
"Could not find HIP installation")
endif()
ENDIF()
IF(MSVC)
# we want to respect the standard, and we are bored of those **** .
ADD_DEFINITIONS(-D_CRT_SECURE_NO_DEPRECATE=1)
LIST(APPEND CUDA_NVCC_FLAGS "-Xcompiler /wd4819 -Xcompiler /wd4503 -Xcompiler /wd4190 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4275 -Xcompiler /wd4522")
ENDIF(MSVC)
if(USE_ROCM)
SET(AT_CUDA_ENABLED 1)
add_subdirectory(src/THC)
add_subdirectory(src/THCUNN)
message("ROCm is enabled.")
elseif(USE_CUDA)
SET(AT_CUDA_ENABLED 1)
add_subdirectory(src/THC)
add_subdirectory(src/THCUNN)
else()
message("disabling CUDA because USE_CUDA is set false")
SET(AT_CUDA_ENABLED 0)
endif()
list(APPEND ATen_CPU_INCLUDE
${CMAKE_CURRENT_SOURCE_DIR}/src/THNN
${CMAKE_CURRENT_SOURCE_DIR}/src/THCUNN)
list(APPEND ATen_CPU_INCLUDE
${CMAKE_CURRENT_SOURCE_DIR}/src
${CMAKE_CURRENT_SOURCE_DIR}/../third_party/catch/single_include
${CMAKE_CURRENT_BINARY_DIR}/src/ATen)
add_subdirectory(src/ATen)
# Pass source, includes, and libs to parent
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)
set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)
set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)
set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)
set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)
set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CORE_TEST_SRCS ${ATen_CORE_TEST_SRCS} PARENT_SCOPE)

258
aten/README.md Normal file
View File

@ -0,0 +1,258 @@
# ATen: A TENsor library
ATen is a simple tensor library thats exposes the Tensor operations in Torch
and PyTorch directly in C++11. The wrapper respects the semantics of operators
in PyTorch, except minor details due to differences between C++ and Python in
the way default arguments are handled. See the [documentation for tensors](http://pytorch.org/docs/tensors.html) in PyTorch for what these operations do.
ATen's API is auto-generated from the same declarations PyTorch uses so the
two APIs will track each other over time.
Tensor types are resolved dynamically, such that the API is generic and
does not include templates. That is, there is one `Tensor` type. It can hold a
CPU or CUDA Tensor, and the tensor may have Doubles, Float, Ints, etc. This design
makes it easy to write generic code without templating everything.
See https://pytorch.org/cppdocs for the provided API. Excerpt:
```c++
Tensor atan2(const Tensor & other) const;
Tensor & atan2_(const Tensor & other);
Tensor pow(Scalar exponent) const;
Tensor pow(const Tensor & exponent) const;
Tensor & pow_(Scalar exponent);
Tensor & pow_(const Tensor & exponent);
Tensor lerp(const Tensor & end, Scalar weight) const;
Tensor & lerp_(const Tensor & end, Scalar weight);
Tensor histc() const;
Tensor histc(int64_t bins) const;
Tensor histc(int64_t bins, Scalar min) const;
Tensor histc(int64_t bins, Scalar min, Scalar max) const;
```
Inplace operations are also provided, and always suffixed by `_` to indicate they will modify the Tensor.
### Installation
TH/THC/THNN/THCUNN are provided (as git subtrees), so the repo is standalone. You will need a C++11 compiler, cmake, and the pyyaml python package.
```
# Install pyyaml used by python code generation to read API declarations
# macOS: if you don't have pip
sudo easy_install pip
# Ubuntu: if you don't have pip
apt-get -y install python-pip
# if you don't have pyyaml
sudo pip install pyyaml
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/where/you/want # specify your dest directory
# cmake .. -DUSE_NVRTC=ON -DUSE_TENSORRT=OFF -DCMAKE_INSTALL_PREFIX=../install -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DUSE_CUDA=ON # for CUDA
# cmake .. -DUSE_CUDA=OFF # for CPU only machines
make install
```
### Example usage
Here is a simple example; again, the syntax follows Torch semantics.
```c++
using namespace at; // assumed in the following
Tensor d = CPU(kFloat).ones({3, 4});
Tensor r = CPU(kFloat).zeros({3,4});
for(auto i = 0; i < 100000; i++) {
r = r.add(d);
// equivalently
r = r + d;
// or
r += d;
}
```
Want this running on the GPU?
```c++
using namespace at; // assumed in the following
Tensor d = CUDA(kFloat).ones({3, 4});
Tensor r = CUDA(kFloat).zeros({3,4});
for(auto i = 0; i < 100000; i++) {
r = r.add(d);
// equivalently
r = r + d;
// or
r += d;
}
```
Expressions like `CUDA(kFloat)` are first-class `at::Type` objects that represent
the type of a Tensor and are used to create Tensors when their type cannot be
inferred.
See more in [sample files](src/ATen/test).
### Creating your kernel
It is easy to create new kernels, thanks to the `dispatch<>()` templated function. Example:
```c++
// a simple sum kernel (for CPU only)
template<typename T>
struct sum_op {
// dispatch handles variable arguments for you
Tensor CPU(const Type & t, Tensor & x_)
{
Tensor x = x_.contiguous();
auto x_p = x.data<T>();
int64_t size = x.numel();
T sum = 0;
for(int64_t i = 0; i < size; i++) {
sum += x_p[i];
}
return sum;
};
Tensor CUDA(Tensor& x) {
throw std::invalid_argument("device not supported");
};
};
Tensor a = CPU(kFloat).rand({3, 7});
std::cout << a << std::endl;
std::cout << dispatch<sum_op>(a.type(),a) << " == " << a.sum() << std::endl;
```
### Efficient access to tensor elements
When using Tensor-wide operations, the relative cost of dynamic dispatch is very small.
However, there are cases, especially in your own kernels, where efficient element-wise access is needed,
and the cost of dynamic dispatch inside the element-wise loop is very high.
ATen provides _accessors_ that are created with a single dynamic check that a Tensor is the type and number of
dimensions. Accessors then expose an API for accessing the Tensor elements efficiently:
```c++
Tensor foo = CPU(kFloat).rand({12,12});
// assert foo is 2-dimensional and holds floats.
auto foo_a = foo.accessor<float,2>();
float trace = 0;
for(int i = 0; i < foo_a.size(0); i++) {
// use the accessor foo_a to get tensor data.
trace += foo_a[i][i];
}
```
Accessors are temporary views of a Tensor. They are only valid for the lifetime of the tensor that they
view and hence should only be used locally in a function, like iterators.
### Using externally created data
If you already have your tensor data allocated in memory (CPU or CUDA),
you can view that memory as a Tensor in ATen:
```c++
float data[] = { 1, 2, 3,
4, 5, 6};
auto f = CPU(kFloat).tensorFromBlob(data, {2,3});
cout << f << endl;
```
These tensors cannot be resized because ATen does not own the memory, but otherwise
behave as normal tensors.
### Scalars and zero-dimensional tensors
In addition to the `Tensor` objects, ATen also includes `Scalar`s that represent a single number.
Like a Tensor, Scalars are dynamically typed and can hold any one of ATen's number types.
Scalars can be implicitly constructed from C++ number types. Scalars are needed because some functions like `addmm` take numbers along with Tensors and expect these
numbers to be the same dynamic type as the tensor. They are also used in the API to indicate places where
a function will _always_ return a Scalar value, like `sum`.
```c++
Tensor addmm(Scalar beta, const Tensor & self,
Scalar alpha, const Tensor & mat1,
const Tensor & mat2);
Scalar sum(const Tensor & self);
//usage
Tensor a = ...
Tensor b = ...
Tensor c = ...
Tensor r = addmm(1.0, a, .5, b, c);
```
In addition to Scalars, ATen also allows Tensor objects to be zero-dimensional. These Tensors hold
a single value and they can be references to a single element in a larger Tensor. They can be used anywhere a Tensor is expected. They are normally created by operators like `select` which reduce the dimensions of
a Tensor.
```c++
Tensor two = CPU(kFloat).rand({10,20});
two[1][2] = 4;
//~~~~~~~ zero-dimensional Tensor
```
It is possible to convert between Scalar and zero-dim Tensors:
```c++
Tensor zero_dim = CPU(kFloat).scalarTensor(4);
Scalar from_tensor = Scalar(zero_dim); //only valid when zero_dim.dim() == 0;
```
### Avoiding unnecessary CUDA synchronization in your kernels when using Scalars
Moving a single number from the GPU to the CPU introduces a synchronization point
that can add latency to your program. In certain cases the result of a GPU operator like `sum` which
returns a Scalar may be plugged into another GPU operator as an argument. If Scalars were always copied
to the CPU, this would result in 2 copies. To avoid these synchronizations, Scalar objects can be
optionally backed by a zero-dim Tensor, and are only copied to the CPU when requested.
```c++
auto a = CUDA(kFloat).rand({3,4});
Scalar on_gpu = Scalar(a[1][1]); //backed by zero-dim Tensor
assert(on_gpu.isBackedByTensor());
double value = on_gpu.toDouble(); // copied to CPU, if it was backed by GPU Tensor.
Scalar svalue = on_gpu.local(); // force the Scalar to become local to CPU.
// get the scalar as a zero-dim tensor. If it was already backed
// by a zero-dim Tensor then this op has no synchronization.
// if the Scalar was local on CPU, it performs the copy
Tensor same_tensor = CUDA(kFloat).scalarTensor(on_gpu);
```
Operators aware of the location of Scalars can arrange to do the minimal number of copies required.
### Developer notes
ATen relies heavily on code generation to automatically generate headers
and implementations for all of the tensor methods it supports. The main
entry point for the script which does all this work is
[`src/ATen/gen.py`](src/ATen/gen.py), which ingests
[`src/ATen/Declarations.cwrap`](src/ATen/Declarations.cwrap),
[`src/ATen/nn.yaml`](src/ATen/nn.yaml),
[`src/ATen/native/native_functions.yaml`](src/ATen/native/native_functions.yaml) and the THNN/THCUNN headers and
produces all of the headers and wrapping code necessary to generate
the ATen interface.
If you need to understand how ATen understands a declaration after all
of this processing occurs, it's helpful to look at the generated file
`Declarations.yaml` (NB: not cwrap) which contains information for all
ATen methods in a uniform manner. This file is utilized by PyTorch
which further extends the ATen interface with support for automatic
differentation.
#### Note [ATen preprocessor philosophy]
ATen is designed to be simple to use, and one of the things this implies is
that it should not be necessary to use preprocessor macros when using ATen;
we would rather provide all symbols, even for functionality that is not
available on the system ATen is running on.
This means that internally inside ATen, whereas other libraries might
simply omit source files for, e.g., CuDNN, when CuDNN libraries are not
installed, ATen will always build these source files, compiling stub
functions for anything that is not available. ATen never uses
`AT_ENABLED_CUDA()` in header files, and all types in ATen's public API
are always available no matter your build configuration.

21
aten/conda/build.sh Normal file
View File

@ -0,0 +1,21 @@
#!/bin/bash
set -e
if [ -z "$PREFIX" ]; then
PREFIX="$CONDA_PREFIX"
fi
# When conda-build constructs a new working copy to perform a build
# in, it recursively copies *all* files and directories in the original
# source directory, including any pre-existing build products (e.g.,
# if you previously ran cmake.) This is problematic, because if
# a 'build' directory already exists, cmake will reuse build settings
# rather than recompute them from scratch. We want a fresh build, so
# we prophylactically remove the build directory.
rm -rf build || true
mkdir -p build
cd build
cmake -DCMAKE_INSTALL_PREFIX="$PREFIX" -DCMAKE_PREFIX_PATH="$PREFIX" -DCMAKE_BUILD_TYPE=Release $CONDA_CMAKE_ARGS ..
make install -j20

33
aten/conda/meta.yaml Normal file
View File

@ -0,0 +1,33 @@
{% set version = "0.1.dev" %}
package:
name: aten
version: {{ version }}
source:
path: ..
build:
number: 1
skip: True # [win]
script_env:
- CONDA_CMAKE_ARGS
requirements:
build:
- cmake
- pyyaml
- setuptools
- python
- mkl # [not osx]
run:
- mkl # [not osx]
about:
home: https://github.com/zdevito/ATen
license: BSD
summary: A TENsor library for C++11
extra:
recipe-maintainers:
- ezyang

1
aten/src/ATen/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
Config.h

26
aten/src/ATen/ATen.h Normal file
View File

@ -0,0 +1,26 @@
#pragma once
#include "ATen/core/ATenGeneral.h"
#include "ATen/Allocator.h"
#include "ATen/CPUGeneral.h"
#include "ATen/CUDAGuard.h"
#include "ATen/Context.h"
#include "ATen/Device.h"
#include "ATen/DeviceGuard.h"
#include "ATen/DimVector.h"
#include "ATen/Dispatch.h"
#include "ATen/Formatting.h"
#include "ATen/Functions.h"
#include "ATen/core/Generator.h"
#include "ATen/core/Layout.h"
#include "ATen/OptionsGuard.h"
#include "ATen/core/Scalar.h"
#include "ATen/ScalarOps.h"
#include "ATen/core/Storage.h"
#include "ATen/Tensor.h"
#include "ATen/TensorGeometry.h"
#include "ATen/core/TensorMethods.h"
#include "ATen/TensorOperators.h"
#include "ATen/core/TensorOptions.h"
#include "ATen/Type.h"
#include "ATen/core/Error.h"

View File

@ -0,0 +1,9 @@
# Find the TH includes and library
#
# ATEN_INCLUDE_DIR -- where to find the includes
# ATEN_LIBRARIES -- list of libraries to link against
# ATEN_FOUND -- set to 1 if found
SET(ATEN_FOUND 1)
SET(ATEN_INCLUDE_DIR "@ATEN_INCLUDE_DIR@")
SET(ATEN_LIBRARIES "@ATEN_LIBRARIES@")

View File

@ -0,0 +1,43 @@
#pragma once
#include "ATen/Config.h"
#include "ATen/core/Half.h"
// Defines the accumulation type for a scalar type.
// Example:
// using accscalar_t = acc_type<scalar_t, true>;
#ifdef __CUDACC__
#include <cuda.h>
#include <cuda_fp16.h>
#endif
namespace at {
template <typename T, bool is_cuda>
struct AccumulateType { };
#ifdef __CUDACC__
template <> struct AccumulateType<half, true> { using type = float; };
#endif
template <> struct AccumulateType<Half, true> { using type = float; };
template <> struct AccumulateType<float, true> { using type = float; };
template <> struct AccumulateType<double, true> { using type = double; };
template <> struct AccumulateType<int8_t, true> { using type = int64_t; };
template <> struct AccumulateType<uint8_t, true> { using type = int64_t; };
template <> struct AccumulateType<char, true> { using type = int64_t; };
template <> struct AccumulateType<int16_t, true> { using type = int64_t; };
template <> struct AccumulateType<int32_t, true> { using type = int64_t; };
template <> struct AccumulateType<int64_t, true> { using type = int64_t; };
template <> struct AccumulateType<float, false> { using type = double; };
template <> struct AccumulateType<double, false> { using type = double; };
template <> struct AccumulateType<int8_t, false> { using type = int64_t; };
template <> struct AccumulateType<uint8_t, false> { using type = int64_t; };
template <> struct AccumulateType<char, false> { using type = int64_t; };
template <> struct AccumulateType<int16_t, false> { using type = int64_t; };
template <> struct AccumulateType<int32_t, false> { using type = int64_t; };
template <> struct AccumulateType<int64_t, false> { using type = int64_t; };
template<typename T, bool is_cuda>
using acc_type = typename AccumulateType<T, is_cuda>::type;
} // namespace at

View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/Allocator.h>

2
aten/src/ATen/ArrayRef.h Normal file
View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/ArrayRef.h>

2
aten/src/ATen/Backend.h Normal file
View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/Backend.h>

View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/Backtrace.h>

View File

@ -0,0 +1,384 @@
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
SET(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake ${CMAKE_MODULE_PATH})
IF(NOT MSVC)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-ignored-qualifiers")
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-ignored-qualifiers")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-absolute-value")
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-absolute-value")
ENDIF(NOT MSVC)
# Can be compiled standalone
IF(NOT AT_INSTALL_BIN_DIR OR NOT AT_INSTALL_LIB_DIR OR NOT AT_INSTALL_INCLUDE_DIR OR NOT AT_INSTALL_SHARE_DIR)
SET(AT_INSTALL_BIN_DIR "bin" CACHE PATH "AT install binary subdirectory")
SET(AT_INSTALL_LIB_DIR "lib" CACHE PATH "AT install library subdirectory")
SET(AT_INSTALL_INCLUDE_DIR "include" CACHE PATH "AT install include subdirectory")
SET(AT_INSTALL_SHARE_DIR "share" CACHE PATH "AT install include subdirectory")
ENDIF()
CONFIGURE_FILE(Config.h.in "${CMAKE_CURRENT_SOURCE_DIR}/Config.h")
CONFIGURE_FILE(cuda/CUDAConfig.h.in "${CMAKE_CURRENT_SOURCE_DIR}/cuda/CUDAConfig.h")
# NB: If you edit these globs, you'll have to update setup.py package_data as well
FILE(GLOB base_h "*.h" "detail/*.h")
FILE(GLOB base_cpp "*.cpp" "detail/*.cpp")
add_subdirectory(core)
FILE(GLOB cuda_h "cuda/*.h" "cuda/detail/*.h" "cuda/*.cuh" "cuda/detail/*.cuh")
FILE(GLOB cuda_cpp "cuda/*.cpp" "cuda/detail/*.cpp")
FILE(GLOB cuda_cu "cuda/*.cu" "cuda/detail/*.cu")
FILE(GLOB cudnn_h "cudnn/*.h" "cudnn/*.cuh")
FILE(GLOB cudnn_cpp "cudnn/*.cpp")
FILE(GLOB miopen_h "miopen/*.h")
FILE(GLOB miopen_cpp "miopen/*.cpp")
FILE(GLOB mkl_cpp "mkl/*.cpp")
FILE(GLOB mkldnn_cpp "mkldnn/*.cpp")
FILE(GLOB native_cpp "native/*.cpp")
FILE(GLOB native_sparse_cpp "native/sparse/*.cpp")
FILE(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu")
FILE(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp")
FILE(GLOB native_cudnn_cpp "native/cudnn/*.cpp")
FILE(GLOB native_miopen_cpp "native/miopen/*.cpp")
FILE(GLOB native_cuda_cu "native/cuda/*.cu")
FILE(GLOB native_cuda_cpp "native/cuda/*.cpp")
FILE(GLOB native_mkl_cpp "native/mkl/*.cpp")
FILE(GLOB native_mkldnn_cpp "native/mkldnn/*.cpp")
set(all_cpu_cpp ${base_cpp} ${ATen_CORE_SRCS} ${native_cpp} ${native_sparse_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp} ${generated_cpp} ${ATen_CPU_SRCS} ${cpu_kernel_cpp})
if(AT_MKL_ENABLED)
set(all_cpu_cpp ${all_cpu_cpp} ${mkl_cpp})
endif()
if(AT_MKLDNN_ENABLED)
set(all_cpu_cpp ${all_cpu_cpp} ${mkldnn_cpp})
endif()
IF(USE_CUDA OR USE_ROCM)
list(APPEND ATen_CUDA_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/cuda)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} ${cuda_cu} ${native_cuda_cu} ${native_sparse_cuda_cu})
set(all_cuda_cpp ${native_sparse_cuda_cpp} ${cuda_cpp} ${native_cuda_cpp} ${cuda_generated_cpp} ${ATen_CUDA_SRCS})
IF(USE_CUDA)
SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${all_cuda_cpp})
IF(CUDNN_FOUND)
SET(all_cuda_cpp ${all_cuda_cpp} ${cudnn_cpp})
ENDIF()
ELSEIF(USE_ROCM)
SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${miopen_cpp} ${all_cuda_cpp})
ENDIF()
endif()
filter_list(generated_h generated_cpp "\\.h$")
filter_list(cuda_generated_h cuda_generated_cpp "\\.h$")
list(APPEND ATen_CPU_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/..)
# so the build can find the generated header files
list(APPEND ATen_CPU_INCLUDE ${CMAKE_CURRENT_BINARY_DIR})
IF(NOT AT_LINK_STYLE)
SET(AT_LINK_STYLE SHARED)
ENDIF()
IF(BLAS_FOUND)
IF ($ENV{TH_BINARY_BUILD})
MESSAGE(STATUS "TH_BINARY_BUILD detected. Enabling special linkage.")
list(APPEND ATen_CPU_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
if(USE_CUDA OR USE_ROCM)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
endif()
ELSE ($ENV{TH_BINARY_BUILD})
list(APPEND ATen_CPU_DEPENDENCY_LIBS ${BLAS_LIBRARIES})
if(USE_CUDA OR USE_ROCM)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${BLAS_LIBRARIES}")
endif()
ENDIF ($ENV{TH_BINARY_BUILD})
ENDIF(BLAS_FOUND)
IF(LAPACK_FOUND)
list(APPEND ATen_CPU_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
if(USE_CUDA OR USE_ROCM)
# Although Lapack provides CPU (and thus, one might expect that ATen_cuda
# would not need this at all), some of our libraries (magma in particular)
# backend to CPU BLAS/LAPACK implementations, and so it is very important
# we get the *right* implementation, because even if the symbols are the
# same, LAPACK implementions may have different calling conventions.
# This caused https://github.com/pytorch/pytorch/issues/7353
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
endif()
ENDIF(LAPACK_FOUND)
IF (UNIX AND NOT APPLE)
INCLUDE(CheckLibraryExists)
# https://github.com/libgit2/libgit2/issues/2128#issuecomment-35649830
CHECK_LIBRARY_EXISTS(rt clock_gettime "time.h" NEED_LIBRT)
IF(NEED_LIBRT)
list(APPEND ATen_CPU_DEPENDENCY_LIBS rt)
SET(CMAKE_REQUIRED_LIBRARIES ${CMAKE_REQUIRED_LIBRARIES} rt)
ENDIF(NEED_LIBRT)
ENDIF(UNIX AND NOT APPLE)
IF(UNIX)
SET(CMAKE_EXTRA_INCLUDE_FILES "sys/mman.h")
CHECK_FUNCTION_EXISTS(mmap HAVE_MMAP)
IF(HAVE_MMAP)
ADD_DEFINITIONS(-DHAVE_MMAP=1)
ENDIF(HAVE_MMAP)
# done for lseek: https://www.gnu.org/software/libc/manual/html_node/File-Position-Primitive.html
ADD_DEFINITIONS(-D_FILE_OFFSET_BITS=64)
CHECK_FUNCTION_EXISTS(shm_open HAVE_SHM_OPEN)
IF(HAVE_SHM_OPEN)
ADD_DEFINITIONS(-DHAVE_SHM_OPEN=1)
ENDIF(HAVE_SHM_OPEN)
CHECK_FUNCTION_EXISTS(shm_unlink HAVE_SHM_UNLINK)
IF(HAVE_SHM_UNLINK)
ADD_DEFINITIONS(-DHAVE_SHM_UNLINK=1)
ENDIF(HAVE_SHM_UNLINK)
CHECK_FUNCTION_EXISTS(malloc_usable_size HAVE_MALLOC_USABLE_SIZE)
IF(HAVE_MALLOC_USABLE_SIZE)
ADD_DEFINITIONS(-DHAVE_MALLOC_USABLE_SIZE=1)
ENDIF(HAVE_MALLOC_USABLE_SIZE)
ENDIF(UNIX)
if(NOT MSVC)
list(APPEND ATen_CPU_DEPENDENCY_LIBS m)
endif()
if(MKLDNN_FOUND)
list(APPEND ATen_CPU_DEPENDENCY_LIBS ${MKLDNN_LIBRARIES})
endif(MKLDNN_FOUND)
list(APPEND ATen_CPU_DEPENDENCY_LIBS cpuinfo)
if(NOT MSVC AND NOT EMSCRIPTEN)
# Preserve values for the main build
set(__aten_sleef_build_shared_libs ${BUILD_SHARED_LIBS})
set(__aten_sleef_build_tests ${BUILD_TESTS})
# Unset our restrictive C++ flags here and reset them later.
# Remove this once we use proper target_compile_options.
set(OLD_CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
set(CMAKE_CXX_FLAGS)
set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build sleef static" FORCE)
set(BUILD_DFT OFF CACHE BOOL "Don't build sleef DFT lib" FORCE)
set(BUILD_GNUABI_LIBS OFF CACHE BOOL "Don't build sleef gnuabi libs" FORCE)
set(BUILD_TESTS OFF CACHE BOOL "Don't build sleef tests" FORCE)
add_subdirectory("${CMAKE_CURRENT_SOURCE_DIR}/../../../third_party/sleef" ${CMAKE_BINARY_DIR}/sleef)
set_property(TARGET sleef PROPERTY FOLDER "dependencies")
list(APPEND ATen_THIRD_PARTY_INCLUDE ${CMAKE_BINARY_DIR}/include)
link_directories(${CMAKE_BINARY_DIR}/sleef/lib)
list(APPEND ATen_CPU_DEPENDENCY_LIBS sleef)
set(CMAKE_CXX_FLAGS ${OLD_CMAKE_CXX_FLAGS})
# Set these back. TODO: Use SLEEF_ to pass these instead
set(BUILD_SHARED_LIBS ${__aten_sleef_build_shared_libs} CACHE BOOL "Build shared libs" FORCE)
set(BUILD_TESTS ${__aten_sleef_build_tests} CACHE BOOL "Build tests" FORCE)
endif()
IF(USE_CUDA AND NOT USE_ROCM)
IF ($ENV{ATEN_STATIC_CUDA})
# CuFFT has a complicated static story (especially around CUDA < 9) because it has device callback support
# we first have to build a fake lib that links with no device callbacks,
# and then we link against this object file.
# This was recommended by the CuFFT team at NVIDIA
# build fake CuFFT lib in build dir
EXECUTE_PROCESS(COMMAND touch ${CMAKE_CURRENT_BINARY_DIR}/empty_file.cc)
if(${CUDA_VERSION_MAJOR} EQUAL "8")
SET(CUFFT_FAKELINK_OPTIONS
--generate-code arch=compute_35,code=sm_35
--generate-code arch=compute_50,code=sm_50
--generate-code arch=compute_60,code=sm_60)
elseif(${CUDA_VERSION_MAJOR} EQUAL "9")
SET(CUFFT_FAKELINK_OPTIONS
--generate-code arch=compute_35,code=sm_35
--generate-code arch=compute_50,code=sm_50
--generate-code arch=compute_60,code=sm_60
--generate-code arch=compute_70,code=sm_70)
else()
MESSAGE(FATAL_ERROR "Unhandled major cuda version ${CUDA_VERSION_MAJOR}")
endif()
ADD_CUSTOM_COMMAND(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a
COMMAND "${CUDA_TOOLKIT_ROOT_DIR}/bin/nvcc" -o ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a -Xcompiler -fPIC
${CUFFT_FAKELINK_OPTIONS}
--device-link ${CMAKE_CURRENT_BINARY_DIR}/empty_file.cc -lcufft_static -lculibos
)
ADD_CUSTOM_TARGET(FAKELINKED_CUFFT_TARGET DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a)
add_library(FAKELINKED_CUFFT STATIC IMPORTED GLOBAL)
add_dependencies(FAKELINKED_CUFFT FAKELINKED_CUFFT_TARGET)
set_target_properties(FAKELINKED_CUFFT PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS
${CUDA_LIBRARIES}
${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcusparse_static.a
${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcurand_static.a
${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcublas_static.a
FAKELINKED_CUFFT
${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcufft_static.a
)
ELSE()
list(APPEND ATen_CUDA_DEPENDENCY_LIBS
${CUDA_LIBRARIES}
${CUDA_cusparse_LIBRARY}
${CUDA_curand_LIBRARY})
ENDIF()
if(CUDNN_FOUND)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${CUDNN_LIBRARIES})
endif(CUDNN_FOUND)
IF(USE_MAGMA)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${MAGMA_LIBRARIES})
IF ($ENV{TH_BINARY_BUILD})
list(APPEND ATen_CUDA_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
ENDIF($ENV{TH_BINARY_BUILD})
ENDIF(USE_MAGMA)
IF ($ENV{ATEN_STATIC_CUDA})
list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libculibos.a")
list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart_static.a")
ENDIF($ENV{ATEN_STATIC_CUDA})
ENDIF()
IF(USE_ROCM)
### Link in the ROCm libraries BLAS / RNG.
FIND_LIBRARY(ROCBLAS_LIBRARY rocblas HINTS ${ROCBLAS_PATH}/lib)
FIND_LIBRARY(HIPRAND_LIBRARY hiprand HINTS ${HIPRAND_PATH}/lib)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${ROCBLAS_LIBRARY} ${HIPRAND_LIBRARY})
ENDIF()
# Include CPU paths for CUDA as well
list(APPEND ATen_CUDA_INCLUDE ${ATen_CPU_INCLUDE})
# We have two libraries: libATen_cpu.so and libATen_cuda.so,
# with libATen_cuda.so depending on libATen_cpu.so. The CPU library
# contains CPU code only. libATen_cpu.so is invariant to the setting
# of USE_CUDA (it always builds the same way); libATen_cuda.so is only
# built when USE_CUDA=1 and CUDA is available.
set(ATen_CPU_SRCS ${all_cpu_cpp})
if(AT_LINK_STYLE STREQUAL "INTERFACE")
# Source code can't be added to an interface library, so it is
# passed back to be compiled into the containing library
add_library(ATen_cpu INTERFACE)
list(APPEND ATen_CPU_DEPENDENCY_LIBS ATEN_CPU_FILES_GEN_LIB)
else()
add_library(ATen_cpu ${AT_LINK_STYLE} ${ATen_CPU_SRCS})
if (ATen_THIRD_PARTY_INCLUDE)
target_include_directories(ATen_cpu SYSTEM PRIVATE ${ATen_THIRD_PARTY_INCLUDE})
endif()
target_include_directories(ATen_cpu INTERFACE $<INSTALL_INTERFACE:include>)
target_include_directories(ATen_cpu PRIVATE ${ATen_CPU_INCLUDE})
target_link_libraries(ATen_cpu PUBLIC ${ATen_CPU_DEPENDENCY_LIBS})
target_link_libraries(ATen_cpu PRIVATE ATEN_CPU_FILES_GEN_LIB)
caffe2_interface_library(ATen_cpu ATen_cpu_library)
# Set standard properties on the target
aten_set_target_props(ATen_cpu)
# Make sure these don't get built by parent
set(ATen_CPU_SRCS)
endif()
if(USE_CUDA OR USE_ROCM)
set(ATen_CUDA_SRCS ${all_cuda_cpp})
if(AT_LINK_STYLE STREQUAL "INTERFACE")
# Source code can't be added to an interface library, so it is
# passed back to be compiled into the containing library
add_library(ATen_cuda INTERFACE)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ATEN_CUDA_FILES_GEN_LIB)
else()
# A hack to deal with cuda library dependencies and modern CMake: the
# CUDA_ADD_LIBRARY includes a target_link_libraries, and as a result,
# one cannot use PUBLIC/PRIVATE/INTERFACE for the target anymore. This
# hack adds the PRIVATE keywords to CUDA_LIBRARIES so we can deal with
# it. We will then manually add the cudart library as interface libs.
set(__tmp ${CUDA_LIBRARIES})
set(CUDA_LIBRARIES PRIVATE ${CUDA_LIBRARIES})
torch_cuda_based_add_library(ATen_cuda ${AT_LINK_STYLE} ${ATen_CUDA_SRCS})
set(CUDA_LIBRARIES ${__tmp})
target_link_libraries(ATen_cuda INTERFACE caffe2::cudart)
target_include_directories(
ATen_cuda INTERFACE $<INSTALL_INTERFACE:include>)
target_include_directories(
ATen_cuda PRIVATE ${ATen_THIRD_PARTY_INCLUDE})
target_include_directories(
ATen_cuda PRIVATE ${ATen_CUDA_INCLUDE})
target_link_libraries(
ATen_cuda PRIVATE ${ATen_CUDA_DEPENDENCY_LIBS} ATEN_CUDA_FILES_GEN_LIB)
# These public dependencies must go after the previous dependencies, as the
# order of the libraries in the linker call matters here when statically
# linking; libculibos and cublas must be last.
target_link_libraries(
ATen_cuda PUBLIC ATen_cpu ${ATen_PUBLIC_CUDA_DEPENDENCY_LIBS})
# Set standard properties on the target
aten_set_target_props(ATen_cuda)
caffe2_interface_library(ATen_cuda ATen_cuda_library)
# Make sure these don't get built by parent
set(ATen_CUDA_SRCS)
endif()
endif()
if(NOT AT_LINK_STYLE STREQUAL "INTERFACE")
if(USE_CUDA)
if (NOT $ENV{ATEN_STATIC_CUDA})
cuda_add_cublas_to_target(ATen_cuda)
cuda_add_cufft_to_target(ATen_cuda)
endif()
endif()
if(NOT MSVC)
aten_compile_options(ATen_cpu)
if(USE_CUDA OR USE_ROCM)
aten_compile_options(ATen_cuda)
endif()
endif()
if(NOT ${CMAKE_VERSION} VERSION_LESS "3.1")
set_property(TARGET ATen_cpu PROPERTY CXX_STANDARD 11)
if(USE_CUDA OR USE_ROCM)
set_property(TARGET ATen_cuda PROPERTY CXX_STANDARD 11)
endif()
endif()
endif()
SET(ATEN_INCLUDE_DIR "${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_INCLUDE_DIR}")
CONFIGURE_FILE(ATenConfig.cmake.in "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake")
INSTALL(FILES "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake"
DESTINATION "${AT_INSTALL_SHARE_DIR}/cmake/ATen")
# https://stackoverflow.com/questions/11096471/how-can-i-install-a-hierarchy-of-files-using-cmake
FOREACH(HEADER ${base_h} ${ATen_CORE_HEADERS} ${cuda_h} ${cudnn_h})
string(REPLACE "${CMAKE_CURRENT_SOURCE_DIR}/" "" HEADER_SUB ${HEADER})
GET_FILENAME_COMPONENT(DIR ${HEADER_SUB} DIRECTORY)
INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen/${DIR})
ENDFOREACH()
FOREACH(HEADER ${generated_h} ${cuda_generated_h})
# NB: Assumed to be flat
INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen)
ENDFOREACH()
INSTALL(FILES ${CMAKE_BINARY_DIR}/aten/src/ATen/Declarations.yaml
DESTINATION ${AT_INSTALL_SHARE_DIR}/ATen)
if(ATEN_NO_TEST)
message("disable test because ATEN_NO_TEST is set")
else()
add_subdirectory(test)
endif()
# Pass source, includes, and libs to parent
set(ATen_CORE_SRCS ${ATen_CORE_SRCS} PARENT_SCOPE)
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)
set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)
set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)
set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)
set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)
set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)

View File

@ -0,0 +1,567 @@
#pragma once
#include "ATen/Parallel.h"
#include "ATen/TensorUtils.h"
#include <limits>
#include <utility>
#include <cstring>
namespace at {
/*
[collapse dims] Updates sizes, and strides to reflect a "collapse" of
the info, possibly excluding the optional excludeDim. A "collapsed" version
of the info is the fewest dims that order the tensor's elements in the same
way as the original info. If excludeDim is specified, the collapse is the
fewest dims that order the tensor's elements as the original and preserve the
excluded dimension, unless the tensor collapses to a point.
This function returns a pair of values.
1) The (new) index of the preserved dimension if excludeDim is
specified. 0 if the tensor is collapsed to a point. -1
otherwise.
2) The new number of dimensions.
*/
template <typename T>
inline std::pair<int64_t, int64_t> collapse_dims(
T* sizes,
T* strides,
int64_t dims,
const int excludeDim = -1) {
AT_CHECK(
excludeDim >= -1 && excludeDim < dims,
"expected excluded dim between -1 and dims - 1");
int64_t stopDim = (excludeDim == -1) ? dims : excludeDim;
int64_t newIndex = -1;
int64_t oldIndex = 0;
int64_t remappedExcludedDim = -1;
while (oldIndex < dims) {
// Finds a dimension to collapse into
for (; oldIndex < stopDim; ++oldIndex) {
if (sizes[oldIndex] == 1) {
continue;
}
++newIndex;
sizes[newIndex] = sizes[oldIndex];
strides[newIndex] = strides[oldIndex];
++oldIndex;
break;
}
// Collapses dims
for (; oldIndex < stopDim; ++oldIndex) {
if (sizes[oldIndex] == 1) {
continue;
}
if (strides[newIndex] == sizes[oldIndex] * strides[oldIndex]) {
sizes[newIndex] *= sizes[oldIndex];
strides[newIndex] = strides[oldIndex];
} else {
++newIndex;
sizes[newIndex] = sizes[oldIndex];
strides[newIndex] = strides[oldIndex];
}
}
// Handles excludeDim being set (oldIndex == excludeDim)
if (oldIndex != dims) {
// Preserves excluded dimension
++newIndex;
sizes[newIndex] = sizes[oldIndex];
strides[newIndex] = strides[oldIndex];
remappedExcludedDim = newIndex;
// Restarts iteration after excludeDim
++oldIndex;
stopDim = dims;
}
}
// Handles special case of all dims size 1
if (newIndex == -1 || (newIndex == 0 && sizes[0] == 1)) {
dims = 1;
sizes[0] = 1;
strides[0] = 1;
return std::pair<int64_t, int64_t>(0, 1);
}
dims = newIndex + 1;
return std::pair<int64_t, int64_t>(remappedExcludedDim, dims);
}
/*
* The basic strategy for apply is as follows:
*
* 1. Starting with the outermost index, loop until we reach a dimension where
* the data is no longer contiguous, i.e. the stride at that dimension is not
* equal to the size of the tensor defined by the outer dimensions. Let's call
* this outer (contiguous) tensor A. Note that if the Tensor is contiguous, then
* A is equal to the entire Tensor. Let's call the inner tensor B.
*
* 2. We loop through the indices in B, starting at its outermost dimension. For
* example, if B is a 2x2 matrix, then we do:
*
* B[0][0]
* B[0][1]
* B[1][0]
* B[1][1]
*
* We set the offset into the underlying storage as (storageOffset + stride_B *
* index_B), i.e. basically we compute the offset into the storage as we would
* normally for a Tensor. But because we are guaranteed the subsequent data is
* contiguous in memory, we can simply loop for sizeof(A) iterations and perform
* the operation, without having to follow the order described by the strides of
* A.
*
* 3. As an optimization, we merge dimensions of A that are contiguous in
* memory. For example, if A is a 3x3x3x3 tensor narrowed from a 3x3x4x3 tensor,
* then the first two dimensions can be merged for the purposes of APPLY,
* reducing the number of nested loops.
*/
inline Tensor sort_strides(Tensor& tensor_) {
IntList strides = tensor_.strides();
std::vector<int64_t> indices;
indices.reserve(tensor_.ndimension());
for (int64_t i = 0; i < tensor_.ndimension(); i++) {
indices.push_back(i);
}
std::sort(indices.begin(), indices.end(), [&strides](int64_t i1, int64_t i2) {
return strides[i1] > strides[i2];
});
Tensor tensor = tensor_.permute(indices);
return tensor;
}
template <typename T, int N>
struct strided_tensor_iter_fixed {
public:
T* data_ = NULL;
int64_t dim_ = 0;
int64_t counter_[N] = {0};
int64_t sizes_[N] = {0};
int64_t strides_[N] = {0};
strided_tensor_iter_fixed(strided_tensor_iter_fixed const&) = delete;
void operator=(strided_tensor_iter_fixed const& x) = delete;
strided_tensor_iter_fixed(strided_tensor_iter_fixed&&) = default;
strided_tensor_iter_fixed(Tensor& tensor, bool sort_strides = false)
: data_(tensor.data<T>()) {
std::memset(counter_, 0, sizeof(int64_t) * N);
std::memcpy(
sizes_, tensor.sizes().data(), tensor.ndimension() * sizeof(int64_t));
std::memcpy(
strides_,
tensor.strides().data(),
tensor.ndimension() * sizeof(int64_t));
dim_ = std::get<1>(collapse_dims(sizes_, strides_, tensor.ndimension()));
}
};
template <typename T>
struct strided_tensor_iter {
private:
public:
T* data_ = NULL;
int64_t dim_;
std::vector<int64_t> counter_;
std::vector<int64_t> sizes_;
std::vector<int64_t> strides_;
strided_tensor_iter(strided_tensor_iter const&) = delete;
void operator=(strided_tensor_iter const& x) = delete;
strided_tensor_iter(strided_tensor_iter&&) = default;
strided_tensor_iter(Tensor& tensor)
: data_(tensor.data<T>()),
dim_(tensor.ndimension()),
counter_(dim_, 0),
sizes_(tensor.sizes().vec()),
strides_(tensor.strides().vec()) {
dim_ = std::get<1>(collapse_dims(sizes_.data(), strides_.data(), dim_));
}
};
inline bool _all_equal_numel(at::ArrayRef<Tensor> tensors) {
if (tensors.size() == 0)
return true;
int64_t all_numel = tensors[0].numel();
for (size_t i = 1; i < tensors.size(); i++) {
if (tensors[i].numel() != all_numel)
return false;
}
return true;
}
inline std::string _all_equal_numel_error(at::ArrayRef<Tensor> tensors) {
std::ostringstream oss;
oss << "inconsistent tensor size, expected ";
for (size_t i = 0; i < tensors.size() - 1; i++) {
oss << tensors[i].sizes() << ", ";
}
oss << "and " << tensors[tensors.size() - 1]
<< " to have the same number of elements, but got ";
for (size_t i = 0; i < tensors.size() - 1; i++) {
oss << tensors[i].numel() << ", ";
}
oss << "and " << tensors[tensors.size() - 1].numel()
<< " elements respectively";
return oss.str();
}
inline bool _apply_preamble(ArrayRef<Tensor> tensors) {
checkBackend("CPU_tensor_apply", tensors, Backend::CPU);
if (!_all_equal_numel(tensors))
throw std::runtime_error(_all_equal_numel_error(tensors));
// An empty tensor has no elements
for (auto& t : tensors)
if (t.numel() == 0)
return false;
return true;
}
inline int64_t _max_dim_tensors(ArrayRef<Tensor> tensors) {
int64_t dim = 0;
for (auto& t : tensors)
dim = std::max(dim, t.ndimension());
return dim;
}
inline void iterate(int64_t size){};
template <typename Arg, typename... Args>
inline void iterate(int64_t size, Arg& iter, Args&... iter_tail) {
iter.counter_[iter.dim_ - 1] += size;
iter.data_ = iter.data_ + size * iter.strides_[iter.dim_ - 1];
iterate(size, iter_tail...);
}
inline bool iterate_continue() {
return true;
};
template <typename Arg, typename... Args>
inline bool iterate_continue(Arg& iter, Args&... iter_tail) {
return iter.counter_[iter.dim_ - 1] < iter.sizes_[iter.dim_ - 1] &&
iterate_continue(iter_tail...);
}
inline int64_t max_iterate_size() {
return std::numeric_limits<int64_t>::max();
};
template <typename Arg, typename... Args>
inline int64_t max_iterate_size(Arg& iter, Args&... iter_tail) {
return std::min(
(iter.sizes_[iter.dim_ - 1] - iter.counter_[iter.dim_ - 1]),
max_iterate_size(iter_tail...));
}
inline void iterate_overflow(){};
template <typename Arg, typename... Args>
inline void iterate_overflow(Arg& iter, Args&... iter_tail) {
if (iter.counter_[iter.dim_ - 1] == iter.sizes_[iter.dim_ - 1]) {
for (int64_t i = iter.dim_ - 1; i > 0; i--) {
if (iter.counter_[i] == iter.sizes_[i]) {
iter.counter_[i] = 0;
iter.counter_[i - 1]++;
iter.data_ = iter.data_ - (iter.sizes_[i] * iter.strides_[i]) +
iter.strides_[i - 1];
}
}
}
iterate_overflow(iter_tail...);
}
inline void forward(int64_t offset){};
template <typename Arg, typename... Args>
inline void forward(int64_t offset, Arg& iter, Args&... iter_tail) {
int64_t multi = offset;
for (int64_t i = iter.dim_ - 1; i >= 0; i--) {
int64_t inc = multi % iter.sizes_[i];
multi = multi / iter.sizes_[i];
iter.data_ = iter.data_ + inc * iter.strides_[i];
iter.counter_[i] += inc;
}
forward(offset, iter_tail...);
}
inline int64_t max_dim() {
return 0;
}
template <typename Arg, typename... Args>
inline int64_t max_dim(Arg& iter, Args&... iter_tail) {
return std::max(iter.dim_, max_dim(iter_tail...));
}
inline void apply_op(){};
template <typename Op, typename... Args>
inline void
apply_op(int64_t numel, int64_t offset, const Op& op, Args... iters) {
// For 0-dim tensors
if (numel == 1 && max_dim(iters...) == 0) {
op(*iters.data_...);
return;
}
if (offset > 0)
forward(offset, iters...);
// Splitting this into chunks helps the compiler create faster assembly
for (int64_t i = 0; i < numel;) {
for (; iterate_continue(iters...) && i < numel;) {
op(*iters.data_...);
iterate(1, iters...);
i++;
}
iterate_overflow(iters...);
}
}
inline void apply_kernel(){};
// TODO: Deal elegantly with 0-dim tensors. iters.strides_ of 0-dim
// strided_tensor_iter will be of size 0 for dim 0 and iters.strides_[iters.dim_
// - 1] will index at -1. C++14 integer_sequence could be of use here.
template <typename Op, typename... Args>
inline void
apply_kernel(int64_t numel, int64_t offset, const Op& op, Args... iters) {
if (offset > 0)
forward(offset, iters...);
int64_t size = std::min(numel, max_iterate_size(iters...));
op(size, iters.data_..., iters.strides_[iters.dim_ - 1]...);
iterate(size, iters...);
iterate_overflow(iters...);
int64_t i = size;
size = std::min(numel, max_iterate_size(iters...));
for (; i < numel;) {
op(size, iters.data_..., iters.strides_[iters.dim_ - 1]...);
iterate(size, iters...);
i += size;
iterate_overflow(iters...);
}
}
template <typename scalar1, typename scalar2, typename Op>
inline void
CPU_tensor_parallel_kernel_apply2(Tensor tensor1, Tensor tensor2, const Op op) {
if (!_apply_preamble({tensor1, tensor2}))
return;
if (tensor1.numel() == 1) {
op(1, tensor1.data<scalar1>(), tensor2.data<scalar2>(), 0, 0);
return;
}
if (tensor1.ndimension() < 8 && tensor2.ndimension() < 8) {
parallel_for(
0,
tensor1.numel(),
1,
[&tensor1, &tensor2, &op](int64_t begin, int64_t end) {
apply_kernel(
end - begin,
begin,
op,
strided_tensor_iter_fixed<scalar1, 8>(tensor1),
strided_tensor_iter_fixed<scalar2, 8>(tensor2));
});
} else {
parallel_for(
0,
tensor1.numel(),
1,
[&tensor1, &tensor2, &op](int64_t begin, int64_t end) {
apply_kernel(
end - begin,
begin,
op,
strided_tensor_iter<scalar1>(tensor1),
strided_tensor_iter<scalar2>(tensor2));
});
}
}
/*
Apply a pointwise operator to sequence of tensors
The calling convention for op is a function/functor that takes takes the same
number of pointers of type scalar as the number of given tensors. For example,
to compute a = b * c, op would be of the form:
[](scalar* a_val, const scalar* b_val, const scalar* c_val) { a_val[0] =
b_val[0] * c_val[0]; };
*/
template <typename scalar1, typename Op>
inline void CPU_tensor_apply1(Tensor tensor1, const Op op) {
if (!_apply_preamble({tensor1}))
return;
if (tensor1.ndimension() < 8) {
apply_op(
tensor1.numel(),
0,
op,
strided_tensor_iter_fixed<scalar1, 8>(tensor1, true));
} else {
apply_op(tensor1.numel(), 0, op, strided_tensor_iter<scalar1>(tensor1));
}
}
template <typename scalar1, typename scalar2, typename Op>
inline void CPU_tensor_apply2(Tensor tensor1, Tensor tensor2, const Op op) {
if (!_apply_preamble({tensor1, tensor2}))
return;
if (_max_dim_tensors({tensor1, tensor2}) <= 8) {
apply_op(
tensor1.numel(),
0,
op,
strided_tensor_iter_fixed<scalar1, 8>(tensor1),
strided_tensor_iter_fixed<scalar2, 8>(tensor2));
} else {
apply_op(
tensor1.numel(),
0,
op,
strided_tensor_iter<scalar1>(tensor1),
strided_tensor_iter<scalar2>(tensor2));
}
}
template <typename scalar1, typename scalar2, typename scalar3, typename Op>
inline void
CPU_tensor_apply3(Tensor tensor1, Tensor tensor2, Tensor tensor3, const Op op) {
if (!_apply_preamble({tensor1, tensor2, tensor3}))
return;
if (_max_dim_tensors({tensor1, tensor2, tensor3}) <= 8) {
apply_op(
tensor1.numel(),
0,
op,
strided_tensor_iter_fixed<scalar1, 8>(tensor1),
strided_tensor_iter_fixed<scalar2, 8>(tensor2),
strided_tensor_iter_fixed<scalar3, 8>(tensor3));
} else {
apply_op(
tensor1.numel(),
0,
op,
strided_tensor_iter<scalar1>(tensor1),
strided_tensor_iter<scalar2>(tensor2),
strided_tensor_iter<scalar3>(tensor3));
}
}
template <
typename scalar1,
typename scalar2,
typename scalar3,
typename scalar4,
typename Op>
inline void CPU_tensor_apply4(
Tensor tensor1,
Tensor tensor2,
Tensor tensor3,
Tensor tensor4,
const Op op) {
if (!_apply_preamble({tensor1, tensor2, tensor3, tensor4}))
return;
if (_max_dim_tensors({tensor1, tensor2, tensor3, tensor4}) <= 8) {
apply_op(
tensor1.numel(),
0,
op,
strided_tensor_iter_fixed<scalar1, 8>(tensor1),
strided_tensor_iter_fixed<scalar2, 8>(tensor2),
strided_tensor_iter_fixed<scalar3, 8>(tensor3),
strided_tensor_iter_fixed<scalar4, 8>(tensor4));
} else {
apply_op(
tensor1.numel(),
0,
op,
strided_tensor_iter<scalar1>(tensor1),
strided_tensor_iter<scalar2>(tensor2),
strided_tensor_iter<scalar3>(tensor3),
strided_tensor_iter<scalar4>(tensor4));
}
}
template <typename scalar1, typename Op>
inline void CPU_tensor_parallel_apply1(
Tensor tensor1,
const Op op,
int64_t grain_size = internal::GRAIN_SIZE) {
if (!_apply_preamble({tensor1}))
return;
if (tensor1.ndimension() < 8) {
parallel_for(
0,
tensor1.numel(),
grain_size,
[&tensor1, &op](int64_t begin, int64_t end) {
apply_op(
end - begin,
begin,
op,
strided_tensor_iter_fixed<scalar1, 8>(tensor1, true));
});
} else {
parallel_for(
0,
tensor1.numel(),
grain_size,
[&tensor1, &op](int64_t begin, int64_t end) {
apply_op(
end - begin, begin, op, strided_tensor_iter<scalar1>(tensor1));
});
}
}
template <typename scalar1, typename scalar2, typename Op>
inline void CPU_tensor_parallel_apply2(
Tensor tensor1,
Tensor tensor2,
const Op op,
int64_t grain_size = internal::GRAIN_SIZE) {
if (!_apply_preamble({tensor1, tensor2}))
return;
if (tensor1.ndimension() < 8 && tensor2.ndimension() < 8) {
parallel_for(
0,
tensor1.numel(),
grain_size,
[&tensor1, &tensor2, &op](int64_t begin, int64_t end) {
apply_op(
end - begin,
begin,
op,
strided_tensor_iter_fixed<scalar1, 8>(tensor1),
strided_tensor_iter_fixed<scalar2, 8>(tensor2));
});
} else {
parallel_for(
0,
tensor1.numel(),
grain_size,
[&tensor1, &tensor2, &op](int64_t begin, int64_t end) {
apply_op(
end - begin,
begin,
op,
strided_tensor_iter<scalar1>(tensor1),
strided_tensor_iter<scalar2>(tensor2));
});
}
}
} // namespace at

View File

@ -0,0 +1,31 @@
#pragma once
#include "ATen/core/Error.h"
#include "TH/TH.h"
// This file creates a fake allocator that just throws exceptions if
// it is actually used.
// state passed to the allocator is the std::function<void(void*)> called
// when the blob is release by ATen
namespace at {
static cpu_fixed_malloc(void *, ptrdiff_t) {
AT_ERROR("attempting to resize a tensor view of an external blob");
}
static cpu_fixed_realloc(void *, void*, ptrdiff_t) {
AT_ERROR("attempting to resize a tensor view of an external blob");
}
static cpu_fixed_free(void * state, void * allocation) {
auto on_release = static_cast<std::function<void(void*)>*>(state);
(*on_release)(allocation);
delete on_release;
}
static THAllocator CPU_fixed_allocator =
{ cpu_fixed_malloc, cpu_fixed_realloc, cpu_fixed_free };
}

View File

@ -0,0 +1,16 @@
#include <ATen/CPUGeneral.h>
#include <atomic>
#include <memory>
#include <thread>
namespace at {
// Lock free atomic type
std::atomic<int> num_threads(-1);
void set_num_threads(int num_threads_) {
if (num_threads_ >= 0)
num_threads.store(num_threads_);
}
int get_num_threads() { return num_threads.load(); }
}

View File

@ -0,0 +1,12 @@
#pragma once
// Using AT_API is crucial as otherwise you'll see
// linking errors using MSVC
// See https://msdn.microsoft.com/en-us/library/a90k134d.aspx
// This header adds this if using AT_API
#include "ATen/core/ATenGeneral.h"
namespace at {
AT_API void set_num_threads(int);
AT_API int get_num_threads();
}

View File

@ -0,0 +1,49 @@
#include "ATen/CPUGenerator.h"
#define const_generator_cast(generator) \
dynamic_cast<const CPUGenerator&>(generator)
namespace at {
CPUGenerator::CPUGenerator(Context * context_)
: context(context_), generator(THGenerator_new())
{}
CPUGenerator::~CPUGenerator() {
if (generator)
THGenerator_free(generator);
}
CPUGenerator& CPUGenerator::copy(const Generator& from) {
THGenerator_copy(generator, const_generator_cast(from).generator);
return *this;
}
CPUGenerator& CPUGenerator::free() {
THGenerator_free(generator);
return *this;
}
uint64_t CPUGenerator::seed() {
return THRandom_seed(generator);
}
uint64_t CPUGenerator::initialSeed() {
return THRandom_initialSeed(generator);
}
CPUGenerator& CPUGenerator::manualSeed(uint64_t seed) {
THRandom_manualSeed(generator, seed);
return *this;
}
CPUGenerator& CPUGenerator::manualSeedAll(uint64_t seed) {
// There's only one CPU generator
return manualSeed(seed);
}
void * CPUGenerator::unsafeGetTH() {
return generator;
}
} // namespace at

View File

@ -0,0 +1,20 @@
#include <ATen/CPUTypeDefault.h>
#include <ATen/Context.h>
#include <ATen/CPUGenerator.h>
namespace at {
Allocator* CPUTypeDefault::allocator() const {
return getCPUAllocator();
}
Device CPUTypeDefault::getDeviceFromPtr(void * data) const {
return DeviceType::CPU;
}
std::unique_ptr<Generator> CPUTypeDefault::generator() const {
return std::unique_ptr<Generator>(new CPUGenerator(&at::globalContext()));
}
} // namespace at

View File

@ -0,0 +1,14 @@
#pragma once
#include <ATen/TypeDefault.h>
namespace at {
struct AT_API CPUTypeDefault : public TypeDefault {
CPUTypeDefault(TensorTypeId type_id, bool is_variable, bool is_undefined)
: TypeDefault(type_id, is_variable, is_undefined) {}
Allocator* allocator() const override;
Device getDeviceFromPtr(void * data) const override;
std::unique_ptr<Generator> generator() const override;
};
} // namespace at

View File

View File

View File

View File

@ -0,0 +1,18 @@
#pragma once
#include "ATen/core/Generator.h"
#include "ATen/Utils.h"
#include "ATen/core/Error.h"
namespace at {
template <typename T>
static inline T * check_generator(Generator * expr, Generator * defaultValue) {
if (!expr)
expr = defaultValue;
if(auto result = dynamic_cast<T*>(expr))
return result;
AT_ERROR("Expected a '", typeid(T).name(), "' but found '", typeid(expr).name(), "'");
}
} // namespace at

11
aten/src/ATen/Config.h.in Normal file
View File

@ -0,0 +1,11 @@
#pragma once
// Test these using #if AT_MKL_ENABLED(), not #ifdef, so that it's
// obvious if you forgot to include Config.h
// c.f. https://stackoverflow.com/questions/33759787/generating-an-error-if-checked-boolean-macro-is-not-defined
//
// DO NOT put the macros for CUDA libraries in this file; they belong in cuda/CUDAConfig.h
#define AT_MKLDNN_ENABLED() @AT_MKLDNN_ENABLED@
#define AT_MKL_ENABLED() @AT_MKL_ENABLED@
#define CAFFE2_STATIC_LINK_CUDA() @CAFFE2_STATIC_LINK_CUDA@

144
aten/src/ATen/Context.cpp Normal file
View File

@ -0,0 +1,144 @@
#include "ATen/Config.h"
#include "Context.h"
#include <ATen/core/TensorOptions.h>
#include <thread>
#include <mutex>
#include <sstream>
#include <string>
#include <stdexcept>
#include "ATen/CPUGenerator.h"
#include "ATen/RegisterCPU.h"
#include "ATen/Tensor.h"
#include "TH/TH.h" // for USE_LAPACK
#ifdef USE_SSE3
#include <pmmintrin.h>
#endif
namespace at {
static inline void errorHandler(const char * msg, void * data) {
throw std::runtime_error(msg);
}
static inline void argErrorHandler(int arg, const char * msg, void * data) {
std::stringstream new_error;
new_error << "invalid argument " << arg << ": " << msg;
throw std::runtime_error(new_error.str());
}
Context::Context()
: next_id(static_cast<size_t>(TypeID::NumOptions))
, thc_state(nullptr, [](THCState* p){ /* no-op */ } ) {
THSetDefaultErrorHandler(errorHandler,nullptr);
THSetDefaultArgErrorHandler(argErrorHandler,nullptr);
generator_registry[static_cast<int>(DeviceType::CPU)]
.reset(new CPUGenerator(this));
register_cpu_types(this);
}
// TODO: This could be bad juju if someone calls globalContext() in the
// destructor of an object with static lifetime.
Context & globalContext() {
static Context globalContext_;
return globalContext_;
}
// NB: This method is *purely* whether or not a user requested
// that CuDNN was enabled, it doesn't actually say anything about
// whether or not CuDNN is actually usable.
bool Context::userEnabledCuDNN() const {
return enabled_cudnn;
}
void Context::setUserEnabledCuDNN(bool e) {
enabled_cudnn = e;
}
bool Context::deterministicCuDNN() const {
return deterministic_cudnn;
}
void Context::setDeterministicCuDNN(bool b) {
deterministic_cudnn = b;
}
bool Context::benchmarkCuDNN() const {
return benchmark_cudnn;
}
void Context::setBenchmarkCuDNN(bool b) {
benchmark_cudnn = b;
}
bool Context::hasMKL() const {
#if AT_MKL_ENABLED()
return true;
#else
return false;
#endif
}
bool Context::hasLAPACK() const {
#ifdef USE_LAPACK
return true;
#else
return false;
#endif
}
bool Context::setFlushDenormal(bool on) {
#ifdef USE_SSE3
// Setting flush-to-zero (FTZ) flag
_MM_SET_FLUSH_ZERO_MODE(on ? _MM_FLUSH_ZERO_ON
: _MM_FLUSH_ZERO_OFF);
// Setting denormals-are-zero (DAZ) flag
_MM_SET_DENORMALS_ZERO_MODE(on ? _MM_DENORMALS_ZERO_ON
: _MM_DENORMALS_ZERO_OFF);
return true;
#else
return false;
#endif
}
TypeExtendedInterface& getType(TensorOptions options) {
return globalContext().getType(
options.backend(), options.dtype(), options.is_variable());
}
TypeExtendedInterface& getType(const TensorImpl* impl) {
Backend backend = tensorTypeIdToBackend(impl->type_id());
return globalContext().getType(
backend, dataTypeToScalarType(impl->dtype().id()), impl->is_variable());
}
TypeExtendedInterface& getType(const Tensor& t) {
return getType(t.unsafeGetTensorImpl());
}
Allocator* getCPUAllocator() {
return getTHDefaultAllocator();
}
struct LegacyTypeInit : public LegacyTypeInitInterface {
LegacyTypeInit(LegacyTypeInitArgs) {}
void initCPU() const override {
globalContext();
}
void initCUDA() const override {
globalContext().lazyInitCUDA();
}
void initComplex() const override {
globalContext().lazyInitComplex();
}
};
REGISTER_LEGACY_TYPE_INIT(LegacyTypeInit);
}

194
aten/src/ATen/Context.h Normal file
View File

@ -0,0 +1,194 @@
#pragma once
#include <ATen/CPUGeneral.h>
#include "ATen/core/ATenGeneral.h"
#include "ATen/CUDAStream.h"
#include "ATen/core/Generator.h"
#include "ATen/Type.h"
#include "ATen/TypeExtendedInterface.h"
#include "ATen/Utils.h"
#include "ATen/core/Error.h"
#include "ATen/detail/CUDAHooksInterface.h"
#include "ATen/core/VariableHooksInterface.h"
#include "ATen/detail/ComplexHooksInterface.h"
#include "ATen/core/LegacyTypeDispatch.h"
// This is temporary
#include "ATen/core/ATenCoreTest.h"
#include <memory>
#include <mutex>
#include <cstdint>
namespace at {
struct Tensor;
class AT_API Context {
public:
Context();
TypeExtendedInterface* getNonVariableTypeRaw(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeRaw(p, s));
}
TypeExtendedInterface * getNonVariableTypeOpt(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeOpt(p, s));
}
TypeExtendedInterface & getNonVariableType(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getNonVariableType(p, s));
}
TypeExtendedInterface & getVariableType(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getVariableType(p, s));
}
TypeExtendedInterface & getType(Backend p, ScalarType s, bool is_variable) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getType(p, s, is_variable));
}
// The passed in Type must be delete'able
// TODO: Just make it take a unique_ptr
void registerType(Backend b, ScalarType s, Type* t) {
globalLegacyTypeDispatch().registerType(b, s,
LegacyTypeDispatch::TypeUniquePtr{t, LegacyTypeDeleter([](Type* p) { delete p; }) });
}
Generator & defaultGenerator(DeviceType device_type) {
initCUDAIfNeeded(device_type);
auto & generator = generator_registry[static_cast<int>(device_type)];
if(!generator)
AT_ERROR(DeviceTypeName(device_type), " backend type not enabled.");
return *generator;
}
bool hasMKL() const;
bool hasLAPACK() const;
bool hasMAGMA() const {
return detail::getCUDAHooks().hasMAGMA();
}
bool hasCUDA() const {
return detail::getCUDAHooks().hasCUDA();
}
bool hasCuDNN() const {
return detail::getCUDAHooks().hasCuDNN();
}
int64_t current_device() const {
return detail::getCUDAHooks().current_device();
}
// defined in header so that getNonVariableType has ability to inline
// call_once check. getNonVariableType is called fairly frequently
THCState* lazyInitCUDA() {
std::call_once(thc_init,[&] {
thc_state = detail::getCUDAHooks().initCUDA();
generator_registry[static_cast<int>(DeviceType::CUDA)] =
detail::getCUDAHooks().initCUDAGenerator(this);
detail::getCUDAHooks().registerCUDATypes(this);
});
return thc_state.get();
}
void lazyInitComplex() {
std::call_once(complex_init_, [&] {
detail::getComplexHooks().registerComplexTypes(this);
});
}
THCState* getTHCState() {
// AT_ASSERT(thc_state);
return thc_state.get();
}
int getNumGPUs() const {
return detail::getCUDAHooks().getNumGPUs();
}
size_t freshTypeID() {
return next_id++;
}
bool setFlushDenormal(bool on);
// NB: This method is *purely* whether or not a user requested
// that CuDNN was enabled, it doesn't actually say anything about
// whether or not CuDNN is actually usable. Use cudnn_is_acceptable
// to test this instead
bool userEnabledCuDNN() const;
void setUserEnabledCuDNN(bool e);
bool benchmarkCuDNN() const;
void setBenchmarkCuDNN(bool);
bool deterministicCuDNN() const;
void setDeterministicCuDNN(bool);
std::unique_ptr<Generator>
generator_registry[static_cast<int>(DeviceType::COMPILE_TIME_MAX_DEVICE_TYPES)];
private:
void initCUDAIfNeeded(DeviceType p) {
if (p == DeviceType::CUDA) {
lazyInitCUDA();
}
}
void initComplexIfNeeded(ScalarType s) {
if (isComplexType(s)) {
lazyInitComplex();
}
}
std::once_flag thc_init;
std::once_flag complex_init_;
bool enabled_cudnn = true;
bool deterministic_cudnn = false;
bool benchmark_cudnn = false;
std::atomic<size_t> next_id;
std::unique_ptr<THCState, void(*)(THCState*)> thc_state;
friend struct Type;
};
AT_API Context & globalContext();
static inline void init() {
globalContext();
if (const char *env_p = std::getenv("OMP_NUM_THREADS")) {
at::set_num_threads(std::stoi(env_p));
}
if (const char *env_p = std::getenv("MKL_NUM_THREADS")) {
at::set_num_threads(std::stoi(env_p));
}
}
static inline TypeExtendedInterface& getNonVariableType(Backend p, ScalarType s) {
return globalContext().getNonVariableType(p, s);
}
static inline TypeExtendedInterface& getNonVariableType(DeviceType p, ScalarType s) {
return globalContext().getNonVariableType(deviceTypeToBackend(p), s);
}
AT_API TypeExtendedInterface& getType(TensorOptions options);
AT_API TypeExtendedInterface& getType(const TensorImpl*);
AT_API TypeExtendedInterface& getType(const Tensor&);
AT_API Allocator* getCPUAllocator();
static inline TypeExtendedInterface& CPU(ScalarType s) {
return getNonVariableType(Backend::CPU, s);
}
static inline TypeExtendedInterface& CUDA(ScalarType s) {
return getNonVariableType(Backend::CUDA, s);
}
static inline bool hasCUDA() {
return globalContext().hasCUDA();
}
static inline bool hasCuDNN() {
return globalContext().hasCuDNN();
}
static inline bool hasMKL() {
return globalContext().hasMKL();
}
static inline bool hasLAPACK() {
return globalContext().hasLAPACK();
}
static inline bool hasMAGMA() {
return globalContext().hasMAGMA();
}
static inline int64_t current_device() {
return globalContext().current_device();
}
} // namespace at

View File

@ -0,0 +1,180 @@
#include "ATen/DLConvertor.h"
#include "ATen/Functions.h"
#include <iostream>
#include <sstream>
using namespace std;
namespace at {
static DLDataType getDLDataType(const Type& type) {
DLDataType dtype;
dtype.lanes = 1;
dtype.bits = type.elementSizeInBytes() * 8;
switch (type.scalarType()) {
case ScalarType::Byte:
dtype.code = DLDataTypeCode::kDLUInt;
break;
case ScalarType::Char:
dtype.code = DLDataTypeCode::kDLInt;
break;
case ScalarType::Double:
dtype.code = DLDataTypeCode::kDLFloat;
break;
case ScalarType::Float:
dtype.code = DLDataTypeCode::kDLFloat;
break;
case ScalarType::Int:
dtype.code = DLDataTypeCode::kDLInt;
break;
case ScalarType::Long:
dtype.code = DLDataTypeCode::kDLInt;
break;
case ScalarType::Short:
dtype.code = DLDataTypeCode::kDLInt;
break;
case ScalarType::Half:
dtype.code = DLDataTypeCode::kDLFloat;
break;
case ScalarType::ComplexHalf:
throw std::logic_error("ComplexHalf is not supported by dlpack");
case ScalarType::ComplexFloat:
throw std::logic_error("ComplexFloat is not supported by dlpack");
case ScalarType::ComplexDouble:
throw std::logic_error("ComplexDouble is not supported by dlpack");
case ScalarType::Undefined:
throw std::logic_error("Undefined is not a valid ScalarType");
case ScalarType::NumOptions:
throw std::logic_error("NumOptions is not a valid ScalarType");
}
return dtype;
}
static DLContext getDLContext(const Type& type, const int64_t& device_id) {
DLContext ctx;
ctx.device_id = device_id;
if (type.is_cuda()) {
ctx.device_type = DLDeviceType::kDLGPU;
} else {
ctx.device_type = DLDeviceType::kDLCPU;
}
return ctx;
}
static DeviceType getATenDeviceType(const DLContext& ctx) {
switch (ctx.device_type) {
case DLDeviceType::kDLCPU:
return DeviceType::CPU;
case DLDeviceType::kDLGPU:
return DeviceType::CUDA;
case DLDeviceType::kDLOpenCL:
return DeviceType::OPENCL;
case DLDeviceType::kDLROCM:
return DeviceType::HIP;
default:
throw std::logic_error("Unsupported device_type: " + std::to_string(ctx.device_type));
}
return DeviceType::CPU; // impossible
}
ScalarType toScalarType(const DLDataType& dtype) {
ScalarType stype;
if (dtype.lanes != 1) throw std::logic_error("ATen does not support lanes != 1");
switch (dtype.code) {
case DLDataTypeCode::kDLUInt:
switch (dtype.bits) {
case 8:
stype = ScalarType::Byte;
break;
default:
throw std::logic_error("Unsupported kUInt bits " + std::to_string(dtype.bits));
}
break;
case DLDataTypeCode::kDLInt:
switch (dtype.bits) {
case 8:
stype = ScalarType::Char;
break;
case 16:
stype = ScalarType::Short;
break;
case 32:
stype = ScalarType::Int;
break;
case 64:
stype = ScalarType::Long;
break;
default:
throw std::logic_error("Unsupported kInt bits " + std::to_string(dtype.bits));
}
break;
case DLDataTypeCode::kDLFloat:
switch (dtype.bits) {
case 16:
stype = ScalarType::Half;
break;
case 32:
stype = ScalarType::Float;
break;
case 64:
stype = ScalarType::Double;
break;
default:
throw std::logic_error("Unsupported kFloat bits " + std::to_string(dtype.bits));
}
break;
default:
throw std::logic_error("Unsupported code " + std::to_string(dtype.code));
}
return stype;
}
struct ATenDLMTensor {
Tensor handle;
DLManagedTensor tensor;
};
void deleter(DLManagedTensor * arg) {
delete static_cast<ATenDLMTensor*>(arg->manager_ctx);
}
// This function returns a shared_ptr to memory managed DLpack tensor constructed
// out of ATen tensor
DLManagedTensor* toDLPack(const Tensor& src) {
ATenDLMTensor * atDLMTensor(new ATenDLMTensor);
atDLMTensor->handle = src;
atDLMTensor->tensor.manager_ctx = atDLMTensor;
atDLMTensor->tensor.deleter = &deleter;
atDLMTensor->tensor.dl_tensor.data = src.data_ptr();
int64_t device_id = 0;
if (src.type().is_cuda()) {
device_id = src.get_device();
}
atDLMTensor->tensor.dl_tensor.ctx = getDLContext(src.type(), device_id);
atDLMTensor->tensor.dl_tensor.ndim = src.dim();
atDLMTensor->tensor.dl_tensor.dtype = getDLDataType(src.type());
atDLMTensor->tensor.dl_tensor.shape = const_cast<int64_t*>(src.sizes().data());
atDLMTensor->tensor.dl_tensor.strides = const_cast<int64_t*>(src.strides().data());
atDLMTensor->tensor.dl_tensor.byte_offset = 0;
return &(atDLMTensor->tensor);
}
Tensor fromDLPack(const DLManagedTensor* src) {
DeviceType device_type = getATenDeviceType(src->dl_tensor.ctx);
ScalarType stype = toScalarType(src->dl_tensor.dtype);
auto deleter = [src](void * self) {
src->deleter(const_cast<DLManagedTensor*>(src));
};
return at::from_blob(src->dl_tensor.data,
IntList(src->dl_tensor.shape, src->dl_tensor.ndim),
IntList(src->dl_tensor.strides, src->dl_tensor.ndim),
deleter,
at::device(device_type).dtype(stype));
}
} //namespace at

View File

@ -0,0 +1,17 @@
#pragma once
#include "ATen/Tensor.h"
#include "ATen/ATen.h"
#include "ATen/dlpack.h"
// this convertor will:
// 1) take a Tensor object and wrap it in the DLPack tensor
// 2) take a dlpack tensor and convert it to the ATen Tensor
namespace at {
AT_API ScalarType toScalarType(const DLDataType& dtype);
AT_API DLManagedTensor * toDLPack(const Tensor& src);
AT_API Tensor fromDLPack(const DLManagedTensor* src);
} //namespace at

File diff suppressed because it is too large Load Diff

2
aten/src/ATen/Device.h Normal file
View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/Device.h>

132
aten/src/ATen/DeviceGuard.h Normal file
View File

@ -0,0 +1,132 @@
#pragma once
#include <ATen/core/Device.h>
#include <ATen/core/ScalarType.h>
#include <ATen/Tensor.h>
#include <ATen/core/Error.h>
#include <ATen/core/optional.h>
#include <ATen/detail/CUDAHooksInterface.h>
#include <cstddef>
namespace at {
/// RAII guard that sets a certain default GPU index in its constructor, and
/// changes it back to the device that was originally active upon destruction.
///
/// The index is always reset to the one that was active at the time of
/// construction of the guard. Even if you `set_index` after construction, the
/// destructor will still reset the index to the one that was active at
/// construction time.
struct DeviceGuard {
/// Default constructor, does nothing.
DeviceGuard() = default;
/// Uses the given device's `index()` if it is a CUDA device, else does
/// nothing.
explicit DeviceGuard(Device device) {
if (device.is_cuda()) {
set_index(device.index());
}
}
explicit DeviceGuard(optional<Device> device_opt) {
if (device_opt.has_value() && device_opt.value().is_cuda()) {
set_index(device_opt.value().index());
}
}
/// Calls `set_index` with the given index.
explicit DeviceGuard(int32_t index) {
set_index(index);
}
/// Sets the device to the index on which the given tensor is located.
explicit DeviceGuard(const Tensor& tensor) {
set_index_from(tensor);
}
/// Sets the device to the index on which the first tensor in the list is
/// located. If the list is empty, does nothing.
explicit DeviceGuard(const TensorList& tensors) {
if (!tensors.empty()) {
set_index_from(tensors.front());
}
}
/// Copy is disallowed.
DeviceGuard(const DeviceGuard&) = delete;
DeviceGuard& operator=(const DeviceGuard&) = delete;
/// Move-constructs this `DeviceGuard` from another `DeviceGuard`. The
/// moved-from `DeviceGuard` is modified such that its destruction has no
/// effect (does not reset the device).
DeviceGuard(DeviceGuard&& other) noexcept {
*this = std::move(other);
}
/// Move-assigns this `DeviceGuard` from another `DeviceGuard`. The
/// moved-from `DeviceGuard` is modified such that its destruction has no
/// effect (does not reset the device).
DeviceGuard& operator=(DeviceGuard&& other) noexcept {
this->original_index_ = other.original_index_;
this->last_index_ = other.last_index_;
// Set other's original index to the unspecified/default state, so that it
// doesn't also reset the device in its constructor.
other.original_index_ = -1;
return *this;
}
/// Resets the device to the index that was active at construction of the
/// guard.
~DeviceGuard() {
// It should only not have a value if an index was never actually set.
if (original_index_ != -1) {
// Unchecked because we don't want to throw in the destructor.
detail::DynamicCUDAInterface::unchecked_set_device(original_index_);
}
}
/// Sets the device to the given one.
void set_index(int32_t index) {
if (index == -1) {
return;
}
AT_ASSERT(index >= 0);
if (original_index_ == -1) {
int32_t previous_index = -123;
detail::DynamicCUDAInterface::get_device(&previous_index);
original_index_ = previous_index;
if (index != original_index_) {
detail::DynamicCUDAInterface::set_device(index);
}
} else {
detail::DynamicCUDAInterface::set_device(index);
}
last_index_ = index;
}
/// Calls `set_index` with the `Tensor`'s current device, if it is a CUDA
/// tensor. Does nothing if the `tensor` is not defined.
void set_index_from(const Tensor& tensor) {
if (tensor.defined() && tensor.is_cuda()) {
set_index(tensor.get_device());
}
}
/// Returns the device that was set upon construction of the guard.
int32_t original_index() const noexcept {
return original_index_;
}
/// Returns the last device that was set via `set_index`, if any.
int32_t last_index() const noexcept {
return last_index_;
}
private:
/// The original device that was active at construction of this object.
int32_t original_index_ = -1;
/// The last index that was set via `set_index`.
int32_t last_index_ = -1;
};
} // namespace at

11
aten/src/ATen/DimVector.h Normal file
View File

@ -0,0 +1,11 @@
#pragma once
#include <ATen/core/SmallVector.h>
#include <stdint.h>
namespace at {
/// A container for sizes or strides
using DimVector = SmallVector<int64_t, 5>;
}

130
aten/src/ATen/Dispatch.h Normal file
View File

@ -0,0 +1,130 @@
#pragma once
#include <ATen/Type.h>
#include <ATen/core/Error.h>
#include <ATen/core/Half.h>
#define AT_PRIVATE_CASE_TYPE(enum_type, type, ...) \
case enum_type: { \
using scalar_t = type; \
return __VA_ARGS__(); \
}
#define AT_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_COMPLEX_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()

2
aten/src/ATen/Error.h Normal file
View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/Error.h>

View File

@ -0,0 +1,82 @@
#include "ATen/ExpandUtils.h"
namespace at {
std::vector<int64_t> infer_size(IntList a, IntList b) {
auto dimsA = a.size();
auto dimsB = b.size();
ptrdiff_t ndim = dimsA > dimsB ? dimsA : dimsB;
std::vector<int64_t> expandedSizes(ndim);
for (long i = ndim - 1; i >= 0; --i) {
long offset = ndim - 1 - i;
long dimA = dimsA - 1 - offset;
long dimB = dimsB - 1 - offset;
long sizeA = (dimA >= 0) ? a[dimA] : 1;
long sizeB = (dimB >= 0) ? b[dimB] : 1;
AT_CHECK(
sizeA == sizeB || sizeA == 1 || sizeB == 1,
"The size of tensor a (", sizeA,
") must match the size of tensor b (", sizeB,
") at non-singleton dimension ", i);
// 1s map to the other size (even 0).
expandedSizes[i] = sizeA == 1 ? sizeB : sizeA;
}
return expandedSizes;
}
std::tuple<std::vector<int64_t>, std::vector<int64_t>> inferExpandGeometry(
IntList tensor_sizes,
IntList tensor_strides,
IntList sizes) {
int64_t ndim = sizes.size();
int64_t tensor_dim = tensor_sizes.size();
if (tensor_dim == 0) {
std::vector<int64_t> expandedStrides(ndim, 0);
return std::tuple<std::vector<int64_t>, std::vector<int64_t>>(
sizes.vec(), expandedStrides);
}
std::vector<int64_t> expandedSizes(ndim);
std::vector<int64_t> expandedStrides(ndim);
// create a new geometry for the tensors
for (int64_t i = ndim - 1; i >= 0; --i) {
int64_t offset = ndim - 1 - i;
int64_t dim = tensor_dim - 1 - offset;
int64_t size = (dim >= 0) ? tensor_sizes[dim] : 1;
int64_t stride = (dim >= 0) ? tensor_strides[dim]
: expandedSizes[i + 1] * expandedStrides[i + 1];
int64_t targetSize = sizes[i];
if (targetSize == -1) {
AT_CHECK(
dim >= 0,
"The expanded size of the tensor (",
targetSize,
") isn't allowed in a leading, non-existing dimension ",
i);
targetSize = size;
}
if (size != targetSize) {
AT_CHECK(
size == 1,
"The expanded size of the tensor (",
targetSize,
") must match the existing size (",
size,
") at non-singleton dimension ",
i);
size = targetSize;
stride = 0;
}
expandedSizes[i] = size;
expandedStrides[i] = stride;
}
return std::tuple<std::vector<int64_t>, std::vector<int64_t>>(
expandedSizes, expandedStrides);
}
} // namespace at

169
aten/src/ATen/ExpandUtils.h Normal file
View File

@ -0,0 +1,169 @@
#pragma once
#include "ATen/Tensor.h"
#include "ATen/core/Error.h"
#include <functional>
#include <sstream>
#include <tuple>
namespace at {
AT_API std::vector<int64_t> infer_size(IntList a, IntList b);
AT_API std::tuple<std::vector<int64_t>, std::vector<int64_t> > inferExpandGeometry(
IntList tensor_sizes, IntList tensor_strides, IntList sizes);
// avoid copy-construction of Tensor by using a reference_wrapper.
inline void check_defined(std::initializer_list<std::reference_wrapper<const Tensor>> tensors, const char *api_name) {
for (auto& t : tensors) {
if (!t.get().defined()) {
AT_ERROR(api_name, "(...) called with an undefined Tensor");
}
}
}
inline std::tuple<Tensor> expand_inplace(const Tensor &tensor, const Tensor &to_expand) {
if (tensor.sizes().equals(to_expand.sizes())) {
return std::make_tuple(to_expand);
}
return std::make_tuple(to_expand.expand(tensor.sizes(), /*implicit=*/true)); // see [expand implicit]
}
inline std::tuple<Tensor> expand_inplace(const Tensor &tensor, const Tensor &to_expand, const char *api_name) {
check_defined({tensor, to_expand}, api_name);
return expand_inplace(tensor, to_expand);
}
inline std::tuple<Tensor, Tensor> expand_inplace(const Tensor &tensor, const Tensor &to_expand1, const Tensor &to_expand2) {
if (tensor.sizes().equals(to_expand1.sizes()) && tensor.sizes().equals((to_expand2.sizes()))) {
return std::make_tuple(to_expand1, to_expand2);
}
return std::make_tuple(
to_expand1.expand(tensor.sizes(), /*implicit=*/true), // see [expand implicit]
to_expand2.expand(tensor.sizes(), /*implicit=*/true));
}
inline std::tuple<Tensor, Tensor> expand_inplace(const Tensor &tensor, const Tensor &to_expand1, const Tensor &to_expand2,
const char *api_name) {
check_defined({tensor, to_expand1, to_expand2}, api_name);
return expand_inplace(tensor, to_expand1, to_expand2);
}
inline std::tuple<Tensor, Tensor> expand_outplace(const Tensor &to_expand1, const Tensor &to_expand2) {
if (to_expand1.sizes().equals(to_expand2.sizes())) {
return std::make_tuple(to_expand1, to_expand2);
}
auto expanded_size = infer_size(to_expand1.sizes(), to_expand2.sizes());
return std::make_tuple(
to_expand1.expand(expanded_size, /*implicit=*/true), // see [expand implicit]
to_expand2.expand(expanded_size, /*implicit=*/true));
}
inline std::tuple<Tensor, Tensor> expand_outplace(const Tensor &to_expand1, const Tensor &to_expand2, const char *api_name) {
check_defined({to_expand1, to_expand2}, api_name);
return expand_outplace(to_expand1, to_expand2);
}
inline std::tuple<Tensor, Tensor, Tensor> expand_outplace(const Tensor &to_expand1,
const Tensor &to_expand2,
const Tensor &to_expand3) {
if (to_expand1.sizes().equals(to_expand2.sizes()) && to_expand1.sizes().equals(to_expand3.sizes())) {
return std::make_tuple(to_expand1, to_expand2, to_expand3);
}
auto expanded_size12 = infer_size(to_expand1.sizes(), to_expand2.sizes());
auto expanded_size = infer_size(expanded_size12, to_expand3.sizes());
return std::make_tuple(
to_expand1.expand(expanded_size, /*implicit=*/true), // see [expand implicit]
to_expand2.expand(expanded_size, /*implicit=*/true),
to_expand3.expand(expanded_size, /*implicit=*/true));
}
inline std::tuple<Tensor, Tensor, Tensor> expand_outplace(const Tensor &to_expand1,
const Tensor &to_expand2,
const Tensor &to_expand3,
const char *api_name) {
check_defined({to_expand1, to_expand2, to_expand3}, api_name);
return expand_outplace(to_expand1, to_expand2, to_expand3);
}
inline std::tuple<Tensor> expand_size(const Tensor &to_expand, IntList sizes) {
if(to_expand.sizes().equals(sizes)) {
return std::make_tuple(to_expand);
}
return std::make_tuple(to_expand.expand(sizes, /*implicit=*/true)); // see [expand implicit]
}
inline std::tuple<Tensor> expand_size(const Tensor &to_expand, IntList sizes, const char *api_name) {
check_defined({to_expand}, api_name);
return expand_size(to_expand, sizes);
}
inline std::vector<Tensor> expand_outplace(TensorList to_expand) {
// expands a list of Tensors; ignores undefined (null) tensors
bool first = true;
std::vector<int64_t> sizes;
for (size_t i = 0; i < to_expand.size(); ++i) {
if (!to_expand[i].defined()) {
continue;
} else if (first) {
sizes = to_expand[i].sizes().vec();
first = false;
} else {
sizes = infer_size(sizes, to_expand[i].sizes());
}
}
std::vector<Tensor> result(to_expand.size());
for (size_t i = 0; i < to_expand.size(); ++i) {
if (!to_expand[i].defined()) {
continue;
} else if (to_expand[i].sizes().equals(sizes)) {
result[i] = to_expand[i];
} else {
result[i] = to_expand[i].expand(sizes, /*implicit=*/true); // see [expand implicit]
}
}
return result;
}
// Sums `tensor` repeatedly to produce a tensor of shape `shape`.
// Precondition: is_expandable_to(shape, tensor.sizes()) must be true
static inline Tensor sum_to(Tensor tensor, IntList shape) {
if (shape.size() == 0) {
return tensor.sum();
}
Tensor result = tensor;
while (result.dim() > (int64_t)shape.size()) {
result = result.sum(0, false);
}
for (int64_t i = 0; i < result.dim(); ++i) {
if (shape[i] == 1 && result.sizes()[i] > 1) {
result = result.sum(i, true);
}
}
return result;
}
// True if `shape` can be broadcasted to `desired`
static inline bool is_expandable_to(IntList shape, IntList desired) {
int ndim = shape.size();
int target_dim = desired.size();
if (ndim > target_dim) {
return false;
}
for (int i = 0; i < ndim; i++) {
int64_t size = shape[ndim - i - 1];
int64_t target = desired[target_dim - i - 1];
if (size != target && size != 1) {
return false;
}
}
return true;
}
}

View File

@ -0,0 +1,292 @@
#include "ATen/Formatting.h"
#include <ATen/ATen.h>
#include <cmath>
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <tuple>
namespace at {
//not all C++ compilers have default float so we define our own here
inline std::ios_base& defaultfloat(std::ios_base& __base) {
__base.unsetf(std::ios_base::floatfield);
return __base;
}
//saves/restores number formatting inside scope
struct FormatGuard {
FormatGuard(std::ostream & out)
: out(out), saved(nullptr) {
saved.copyfmt(out);
}
~FormatGuard() {
out.copyfmt(saved);
}
private:
std::ostream & out;
std::ios saved;
};
std::ostream& operator<<(std::ostream & out, IntList list) {
int i = 0;
out << "[";
for(auto e : list) {
if (i++ > 0)
out << ", ";
out << e;
}
out << "]";
return out;
}
std::ostream& operator<<(std::ostream & out, Backend b) {
return out << toString(b);
}
std::ostream& operator<<(std::ostream & out, const Type& t) {
return out << t.toString();
}
static std::tuple<double, int64_t> __printFormat(std::ostream& stream, const Tensor& self) {
auto size = self.numel();
if(size == 0) {
return std::make_tuple(1., 0);
}
bool intMode = true;
auto self_p = self.data<double>();
for(int64_t i = 0; i < size; i++) {
auto z = self_p[i];
if(std::isfinite(z)) {
if(z != std::ceil(z)) {
intMode = false;
break;
}
}
}
int64_t offset = 0;
while(!std::isfinite(self_p[offset])) {
offset = offset + 1;
if(offset == size) {
break;
}
}
double expMin;
double expMax;
if(offset == size) {
expMin = 1;
expMax = 1;
} else {
expMin = fabs(self_p[offset]);
expMax = fabs(self_p[offset]);
for(int64_t i = offset; i < size; i++) {
double z = fabs(self_p[i]);
if(std::isfinite(z)) {
if(z < expMin) {
expMin = z;
}
if(self_p[i] > expMax) {
expMax = z;
}
}
}
if(expMin != 0) {
expMin = std::floor(std::log10(expMin)) + 1;
} else {
expMin = 1;
}
if(expMax != 0) {
expMax = std::floor(std::log10(expMax)) + 1;
} else {
expMax = 1;
}
}
double scale = 1;
int64_t sz;
if(intMode) {
if(expMax > 9) {
sz = 11;
stream << std::scientific << std::setprecision(4);
} else {
sz = expMax + 1;
stream << defaultfloat;
}
} else {
if(expMax-expMin > 4) {
sz = 11;
if(std::fabs(expMax) > 99 || std::fabs(expMin) > 99) {
sz = sz + 1;
}
stream << std::scientific << std::setprecision(4);
} else {
if(expMax > 5 || expMax < 0) {
sz = 7;
scale = std::pow(10, expMax-1);
stream << std::fixed << std::setprecision(4);
} else {
if(expMax == 0) {
sz = 7;
} else {
sz = expMax+6;
}
stream << std::fixed << std::setprecision(4);
}
}
}
return std::make_tuple(scale, sz);
}
static void __printIndent(std::ostream &stream, int64_t indent)
{
for(int64_t i = 0; i < indent; i++) {
stream << " ";
}
}
static void printScale(std::ostream & stream, double scale) {
FormatGuard guard(stream);
stream << defaultfloat << scale << " *" << std::endl;
}
static void __printMatrix(std::ostream& stream, const Tensor& self, int64_t linesize, int64_t indent)
{
double scale;
int64_t sz;
std::tie(scale, sz) = __printFormat(stream, self);
__printIndent(stream, indent);
int64_t nColumnPerLine = (linesize-indent)/(sz+1);
int64_t firstColumn = 0;
int64_t lastColumn = -1;
while(firstColumn < self.size(1)) {
if(firstColumn + nColumnPerLine <= self.size(1)) {
lastColumn = firstColumn + nColumnPerLine - 1;
} else {
lastColumn = self.size(1) - 1;
}
if(nColumnPerLine < self.size(1)) {
if(firstColumn != 0) {
stream << std::endl;
}
stream << "Columns " << firstColumn+1 << " to " << lastColumn+1;
__printIndent(stream, indent);
}
if(scale != 1) {
printScale(stream,scale);
__printIndent(stream, indent);
}
for(int64_t l = 0; l < self.size(0); l++) {
Tensor row = self.select(0,l);
double *row_ptr = row.data<double>();
for(int64_t c = firstColumn; c < lastColumn+1; c++) {
stream << std::setw(sz) << row_ptr[c]/scale;
if(c == lastColumn) {
stream << std::endl;
if(l != self.size(0)-1) {
if(scale != 1) {
__printIndent(stream, indent);
stream << " ";
} else {
__printIndent(stream, indent);
}
}
} else {
stream << " ";
}
}
}
firstColumn = lastColumn + 1;
}
}
void __printTensor(std::ostream& stream, Tensor& self, int64_t linesize)
{
std::vector<int64_t> counter(self.ndimension()-2);
bool start = true;
bool finished = false;
counter[0] = -1;
for(size_t i = 1; i < counter.size(); i++)
counter[i] = 0;
while(true) {
for(int64_t i = 0; self.ndimension()-2; i++) {
counter[i] = counter[i] + 1;
if(counter[i] >= self.size(i)) {
if(i == self.ndimension()-3) {
finished = true;
break;
}
counter[i] = 0;
} else {
break;
}
}
if(finished) {
break;
}
if(start) {
start = false;
} else {
stream << std::endl;
}
stream << "(";
Tensor tensor = self;
for(int64_t i=0; i < self.ndimension()-2; i++) {
tensor = tensor.select(0, counter[i]);
stream << counter[i]+1 << ",";
}
stream << ".,.) = " << std::endl;
__printMatrix(stream, tensor, linesize, 1);
}
}
std::ostream& print(std::ostream& stream, const Tensor & tensor_, int64_t linesize) {
FormatGuard guard(stream);
if(!tensor_.defined()) {
stream << "[ Tensor (undefined) ]";
} else if (tensor_.is_sparse()) {
stream << "[ " << tensor_.toString() << "{}\n";
stream << "indices:\n" << tensor_._indices() << "\n";
stream << "values:\n" << tensor_._values() << "\n";
stream << "size:\n" << tensor_.sizes() << "\n";
stream << "]";
} else {
Type& cpudouble = tensor_.type().toBackend(Backend::CPU).toScalarType(kDouble);
Tensor tensor = tensor_.toType(cpudouble).contiguous();
if(tensor.ndimension() == 0) {
stream << defaultfloat << tensor.data<double>()[0] << std::endl;
stream << "[ " << tensor_.toString() << "{} ]";
} else if(tensor.ndimension() == 1) {
if (tensor.numel() > 0) {
double scale;
int64_t sz;
std::tie(scale, sz) = __printFormat(stream, tensor);
if(scale != 1) {
printScale(stream, scale);
}
double* tensor_p = tensor.data<double>();
for(int64_t i = 0; i < tensor.size(0); i++) {
stream << std::setw(sz) << tensor_p[i]/scale << std::endl;
}
}
stream << "[ " << tensor_.toString() << "{" << tensor.size(0) << "} ]";
} else if(tensor.ndimension() == 2) {
if (tensor.numel() > 0) {
__printMatrix(stream, tensor, linesize, 0);
}
stream << "[ " << tensor_.toString() << "{" << tensor.size(0) << "," << tensor.size(1) << "} ]";
} else {
if (tensor.numel() > 0) {
__printTensor(stream, tensor, linesize);
}
stream << "[ " << tensor_.toString() << "{" << tensor.size(0);
for(int64_t i = 1; i < tensor.ndimension(); i++) {
stream << "," << tensor.size(i);
}
stream << "} ]";
}
}
return stream;
}
}

View File

@ -0,0 +1,24 @@
#pragma once
#include <iostream>
#include "ATen/Type.h"
#include "ATen/core/Scalar.h"
namespace at {
AT_API std::ostream& operator<<(std::ostream & out, IntList list);
AT_API std::ostream& operator<<(std::ostream & out, Backend b);
AT_API std::ostream& operator<<(std::ostream & out, const Type & t);
AT_API std::ostream& print(std::ostream& stream, const Tensor & tensor, int64_t linesize);
static inline std::ostream& operator<<(std::ostream & out, const Tensor & t) {
return print(out,t,80);
}
static inline void print(const Tensor & t, int64_t linesize=80) {
print(std::cout,t,linesize);
}
static inline std::ostream& operator<<(std::ostream & out, Scalar s) {
return out << (s.isFloatingPoint() ? s.toDouble() : s.toLong());
}
}

View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/Generator.h>

2
aten/src/ATen/Half.h Normal file
View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/Half.h>

44
aten/src/ATen/InferSize.h Normal file
View File

@ -0,0 +1,44 @@
#pragma once
#include <ATen/optional.h>
#include <ATen/ScalarType.h>
#include <sstream>
#include <vector>
namespace at {
// Infers the size of a dim with size -1, if it exists. Also checks that new
// shape is compatible with the number of elements.
static std::vector<int64_t> infer_size(IntList shape, int64_t numel) {
auto res = shape.vec();
int64_t newsize = 1;
auto infer_dim = at::optional<int64_t>();
for (int64_t dim = 0, ndim = shape.size(); dim != ndim; dim++) {
if (shape[dim] == -1) {
if (infer_dim) {
throw std::runtime_error("only one dimension can be inferred");
}
infer_dim = dim;
} else if (shape[dim] >= 0) {
newsize *= shape[dim];
} else {
AT_ERROR("invalid shape dimension ", shape[dim]);
}
}
if (numel == newsize || (infer_dim && newsize > 0 && numel % newsize == 0)) {
if (infer_dim) {
// we have a degree of freedom here to select the dimension size; follow NumPy semantics
// and just bail.
AT_CHECK(newsize != 0, "cannot reshape tensor of 0 elements into shape ", shape);
res[*infer_dim] = numel / newsize;
}
return res;
}
std::ostringstream ss;
ss << "shape '" << shape << "' is invalid for input of size " << numel;
throw std::runtime_error(ss.str());
}
}

Some files were not shown because too many files have changed in this diff Show More